Skip to content

KHR_audio_emitter#2137

Open
robertlong wants to merge 20 commits intoKhronosGroup:mainfrom
omigroup:KHR_audio
Open

KHR_audio_emitter#2137
robertlong wants to merge 20 commits intoKhronosGroup:mainfrom
omigroup:KHR_audio

Conversation

@robertlong
Copy link
Copy Markdown
Contributor

@robertlong robertlong commented Mar 31, 2022

KHR_audio_emitter

This extension adds the ability to store audio data and represent both global and positional audio emitters in a scene.

This extension is intended to unify the OMI_audio_emitter and MSFT_audio_emitter extensions into a common audio extension.

Members from the Open Metaverse Interoperability Group, Microsoft, and other Khronos members met on 3/23/2022 to discuss how we might unify OMI's work on audio emitters and Microsoft's previous efforts. OMI_audio_emitter has a subset of the MSFT_audio_emitter extension's features, eliminating features that might be out of scope for a more general extension. KHR_audio_emitter also has some spec changes from the original OMI_audio_emitter requested by Khronos members. There are still some outstanding questions on audio formats, MP3 vs WAV, and what features within these formats should be supported.

We're looking forward to working together on this extension and bringing audio to glTF!

@najadojo
Copy link
Copy Markdown

najadojo commented Apr 1, 2022

Love that this space is getting more attention. I am a little concerned about the elimination of a number of features that MSFT_audio_emitter has. Some limitations are understandable such as the integration with animation. I expect the thinking is that the model and its audio is driven externally by the loaded system. Though this approach may give an artist less control, or at minimum a more complicated workflow. Other limitations I'm not sure I understand such as limiting to one emitter per node. This would just lead to having a number of child nodes for scenarios that require it.

I'd like to remind folks of a demo video we produced for MSFT_audio_emitter that was completely data driven.

@lyuma
Copy link
Copy Markdown

lyuma commented Apr 7, 2022

I wanted to chime in here on the randomized audio clips. I'm generally opposed to having specific fixed-function features in glTF where not necessary, because those features will have to be implemented by everyone and maintained, even in the future when extensions such as KHR_animation2 have been developed more.

That said, the demo you link is really cool. I can't remember the original document but I did see that randomizing environment audio adds a lot to immersion, compared to having a looping set of tracks that play one after the other.

So regarding randomized clips, I'm a bit torn here. I might also suggest maybe some middle ground: rather than require a weighting/randomization system, allow multiple clips per audio emitter, but leave it up to the application or future extensions to implement the randomization / support selecting audio clips. (and otherwise allow just playing the first clip).

As for multiple emitters per node, I would suggest this is not necessary: it would be very easy to add a child node (with no translation, rotation or scale) with another emitter on it. This is similar to how each node only has one mesh, but multiple meshes can be easily added as child nodes.

@UX3D-nopper
Copy link
Copy Markdown
Contributor

Updated the example C++ source code for KHR_audio here: https://github.com/ux3d/OMI/tree/KHR_audio

@robertlong
Copy link
Copy Markdown
Contributor Author

Yeah we talked about this extensively in the OMI glTF meeting yesterday. I'm personally on the side of making this extension as simple as possible. I also realize now that it is a bit odd that we allow multiple emitters on the scene, but not nodes.

Given this feedback my recommendations are:

  1. Maybe we want to expand the spec to add basic mixing support via supporting multiple audio sources on an audio emitter, each with their own gain value.

Here's a proposal with multiple inputs per emitter. There is a gain value on the emitter as well as each of the sources as you would usually see in a mixer. In this proposal the only inputs are audio sources, but you could imagine other audio processing nodes in there as well, similar to the WebAudio API. I think we want to make this spec as simple as possible without limiting future extensions to add more features and adding mixing to the core spec isn't a huge ask.

{
  "emitters": [
    {
      "name": "Positional Emitter",
      "type": "positional",
      "gain": 0.8,
      "inputs": [0, 1],
      "positional": {
        "coneInnerAngle": 6.283185307179586,
        "coneOuterAngle": 6.283185307179586,
        "coneOuterGain": 0.0,
        "distanceModel": "inverse",
        "maxDistance": 10.0,
        "refDistance": 1.0,
        "rolloffFactor": 0.8
      }
    }
  ],
  "sources": [
    {
      "name": "Clip 1",
      "gain": 0.6,
      "playing": true,
      "loop": true,
      "audio": 0
    },
    {
      "name": "Clip 2",
      "gain": 0.6,
      "playing": true,
      "loop": true,
      "audio": 1
    }
  ],
  "audio": [
    {
      "uri": "audio1.mp3"
    },
    {
      "bufferView": 0,
      "mimeType": "audio/mpeg"
    }
  ]
}
  1. We may want to only allow for one global emitter in light of what I'm proposing with mixing audio. You'd just add multiple inputs for the global emitter.

  2. For clip randomization I do think we should be introducing that through some other more generic event based system. Having bespoke behavioral features in the base spec doesn't make a whole lot of sense to me. Clip randomization sounds very useful, the demo shown above is awesome, but we also could use this feature for randomizing active idle animations, or a material color.

  3. KHR_animation2 should be able to target some of the properties in this spec to play one shot clips multiple times, or start looping a clip after a certain frame.

  4. We might want to add loopStart and loopEnd properties to audio sources such that a single audio file can be used and the source samples a section from that file.

  5. We need to add language to the spec to clarify that positional audio emitters "MUST" use a mono audio source

@antpb
Copy link
Copy Markdown

antpb commented Apr 20, 2022

Just want to voice a +1 to Robert's proposed changes above particularly the bits around having multiple sources in the array. I've implemented the OMI audio spec in my WordPress plugin ( https://3ov.xyz ) and this change is early enough and makes enough sense that it is not impacting to my current users.

Also a +1 on having one global emitter and having inputs that feed into that global.

I don't feel strongly about loopStart and loopEnd but do see the benefits.

@UX3D-nopper
Copy link
Copy Markdown
Contributor

UX3D-nopper commented Apr 21, 2022

We got feedback on KHR_animation2. It is renamed to KHR_animation_pointer and we are making good progress on it.
#2147


#### `playing`

Whether or not the specified audio clip is playing. Setting this property `true` will set the audio clip to play on load (autoplay).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for naming this like a state ("playing") vs. naming it as what it does ("autoplay")? Seems that hints at possible implementation details ("changing this should change play state") but that isn't mentioned.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bigger discussion about how animation, audio, etc. should be treated on load. Animations are currently just keyframe data and it's up to the implementation to figure out how to play the animations. https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#animations

So this begs the question if playing or autoPlay or even loop should be included in the spec.

@najadojo @bghgary this also goes against the direction of the MSFT_audio_emitter where there are ways to specify runtime behavior.

Maybe runtime behavior should be left out of this spec and implemented in another?

Copy link
Copy Markdown

@hybridherbst hybridherbst Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point regarding playback behaviour!

If the glTF is purely seen as "data setup", then it might still be desirable to have a way to connect audio clips to animations - e.g. saying "this audio emitter belongs to that animation" (potentially: "at that point in time") would be pure data. This would be similar to how a mesh + material are connected to a node.

What do you think in which direction this connection should be made? Attaching an emitter to a clip or vice versa? I think on the clip would make somewhat more sense:

"animations": [
    {
      "channels": [...],
      "samplers": [...],
      "name": "SimpleCubeAnim",
      "extensions": {
        "KHR_audio": {
          "emitters": [
            {
              "id": 0,
              "offset": 0.0
            },
            {
              "id": 1,
              "offset": 1.3
            }
          ]
        }
      }
    }
  ],

In this example, two emitters belong to this animation clip, viewers would interpret that as they "loop together", and one emitter has a slight offset. Note having the same emitter play multiple times during the loop would be possible by having multiple references to the same ID with separate offsets.

What do you think? I think that would preserve the "glTF is purely data" aspect by giving hints about how things are connected, not how things should be playing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the OMI glTF meeting we discussed this a bit. I think synchronizing audio with an animation should also be made part of another spec. AFAIK synchronizing audio with animations isn't necessarily the best way to deal with this scenario. We should talk more about autoPlay and loop though. Should this be included in the spec? Should we go into more depth on playing/mixing audio? It'd be good to get some feedback from others on this.

Copy link
Copy Markdown
Contributor

@bghgary bghgary May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, at a minimum, we need to define both how the glTF carries the audio payload (including metadata) and how to play the audio, whether that is one or more specs. If this spec only defines the former and that's all we define, it will be hard to know if the spec works and it will be hard to demo.

Copy link
Copy Markdown

@hybridherbst hybridherbst May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you agree that the options are

  • adding info about play behaviour to the audio data itself
  • connecting audio and animations in some way (either from audio to animation or from animation to audio)
    or do you have another one in mind?

I agree that without any means of knowing how the audio is relevant to the scene, viewers won't be able to do anything with it - e.g. at a minimum I think tools like model-viewer should have a way to infer what to do with a file that has multiple animation clips and multiple audio assets (could be 1:1, could be different). A counter-argument would be saying "well for this case, if there's one animation and one audio clip that will be played back, everything else is undefined" (still allowing for cases such as this) but I'm not a huge fan of undefined-by-design...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the OMI glTF meeting we agreed that playing behavior (audioPlay and loop) should be in this spec to define the minimum behavior to play sounds. However, connecting audio and animations should be delegated to an extension like #2147

Copy link
Copy Markdown
Contributor

@bghgary bghgary May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think about the corresponding spec(s) that define how the audio will play before completing this spec. KHR_animation_pointer spec will be able to cover some things (like modifying parameters for looping sounds), but it's probably not enough (e.g. one-shot sounds triggered from an animation).

Comment on lines +181 to +187
#### `coneInnerAngle`

The angle, in radians, of a cone inside of which there will be no volume reduction.

#### `coneOuterAngle`

The angle, in radians, of a cone outside of which the volume will be reduced to a constant value of`coneOuterGain`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a note here that setting this to some value > 2 * PI (the max allowed value) will turn this into a spherical audio source. It's implicit from the defaults but could be explained explicitly here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should act as a point audio source / non-directional source when set. It's worth noting the WebAudio API Specification doesn't specify this detail though. What's also missing is the behavior when the coneOuterAngle is less than the coneInnerAngle or when the coneInnerAngle is greater than the coneOuterAngle. We should check on this before adding these details.

@hybridherbst
Copy link
Copy Markdown

One area that I'd love to see discussed here is behaviour under root scale for AR, as this is something currently broken in both SceneViewer and QuickLook (since it was unspecified, I guess). Happy to provide sample files.

Consider the typical scenario of placing a model in AR with model-viewer, and the model has spatial audio.
Should scaling the model down to 10% of the size

  • assume that the model is now small, but the distance to the viewer is approximately the same?
  • assume that the viewer is now large and the model hasn't changed scale, and thus the distance to the viewer is now much larger?

Unfortunately such considerations were omitted from lighting in glTF, and thus lighting in AR is also "broken everywhere" right now :)

To explain why that's not trivial, here's my take (and happy to take the lighting discussion elsewhere, here for context):

  • audio volume should work as if the model became smaller (if I scale an AR radio down, I want to hear it at the same loudness)
  • audio spatialization should work as if the viewer got bigger (if I have 3 spread out emitters in a space and scale that space down, I still want clear spatial separation between these sources)
  • lighting effects should work as if the viewer got bigger (if I scale a lamp with a spot light down, I don't want the light to become brighter, it should look the same just smaller)


An object containing the positional audio emitter properties. This may only be defined if `type` is set to `positional`.

### Positional Audio Emitter Properties
Copy link
Copy Markdown

@hybridherbst hybridherbst Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are doppler effects considered implementation details? Might want to explicitly call this out here (e.g. animating a node with an audio source, fast viewer movement, ... might cause doppler effects on positional sources).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add a note, yeah. But perhaps audio-listener oriented properties / effects should be defined in another series of extensions?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think these should be specified here, and probably also not in another extension, as its very application-specific. "Doppler effect for audio" is kind of in the same realm as "bloom effect for rendering" in my opinion. The note would be enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add language such as "Implementors may choose to add effects to spatial audio such as simulating the doppler effect." 👍

@donmccurdy
Copy link
Copy Markdown
Contributor

One area that I'd love to see discussed here is behaviour under root scale for AR, as this is something currently broken in both SceneViewer and QuickLook (since it was unspecified, I guess).

If needed, perhaps this could be specified under the model3d XMP Namespace in KHR_xmp_json_ld, alongside model3d:preferredSurfaces for AR? I'm nervous to bring much wider-scoped issues ("how to compose the glTF asset within a larger scene of unknown format for unknown purpose") into KHR_ extensions, we try to define what the content "means" rather than how various categories of applications (AR, HTML viewer, game, etc.) might use it.

@hybridherbst
Copy link
Copy Markdown

hybridherbst commented Apr 25, 2022

I generally agree, but

we try to define what the content "means"

is exactly what I'm trying to say: I think an extension adding audio (or lights) should talk about how that audio works. AR is a strong usecase for glTF, and it would simply be "undefined" (every viewer would do something different) if not explained in the extension that adds audio, in my opinion. Better would have been if such topics ("behaviour under different types of scales") would have been part of the Core spec, of course - if they would have been, they'd have probably forwarded responsibility to extensions though...

I can just say: SceneViewer and QuickLook (glTF and USDZ, respectively) allow for the usage of audio and lights, and both haven't defined/thought about how it behaves under AR scale, so right now we can't use it for many projects where we'd like to. If it stays unspecified for KHR_audio, it's immediately the same issue.

I'll take a deeper look at KHR_xmp_json_ld! Could you let me know where I find more about model3d:preferredSurfaces and other agreed-upon properties? For me that extension so far sounded like "structured extras", but sounds like there's actually more to it?
(Edit: only hint I found was in the Khronos blog)

@donmccurdy
Copy link
Copy Markdown
Contributor

Could you let me know where I find more about model3d:preferredSurfaces and other agreed-upon properties?

Probably the place to start would be Recommendations for KHR_xmp_json_ld usage in glTF.

I think I'm worried that "different types of scale" as a concept has a lot to do with emerging ideas of an "AR viewer" vs. other application types, and that these definitions may change drastically on a timeline of 1-2 years or even months. Embedding these context-specific requirements into KHR_punctual_lights (for example) could cause the extension to become outdated far sooner than it might otherwise. With an XMP namespace there is a proper versioning system and more flexibility to evolve or adapt to specific contexts. I suspect the same applies to KHR_audio.

@UX3D-nopper
Copy link
Copy Markdown
Contributor

UX3D-nopper commented May 19, 2022

  1. We need to add language to the spec to clarify that positional audio emitters "MUST" use a mono audio source
    Also, that global audio emitters "MUST" use a stereo source. Technically, a mono would work as well, but this probably just causes complains by the end user.

@robertlong
Copy link
Copy Markdown
Contributor Author

Technically, a mono would work as well, but this probably just causes complains by the end user.

Yup that makes sense. We could remove the stereo audio source requirement from the global emitter.

we try to define what the content "means" rather than how various categories of applications (AR, HTML viewer, game, etc.) might use it.

We spoke about this during the OMI glTF meeting and KHR_audio should defined within the reference of the glTF document. We agree that the document should define what the content "means" but that the behavior of scaling the content should be up to your application.

However, if an animation in the document is controlling the scaling of the node we should define that behavior and maybe that should inform best practices in this AR viewer use-case.

So in the case of maxDistance what should happen with it when scaled down to 10% of it's original scale?

maxDistance

The maximum distance between the emitter and listener, after which the volume will not be reduced any further. maximumDistance may only be applied when the distanceModel is set to linear. Otherwise, it should be ignored.

Should this respect node scale?

Note that in Unity, Audio Emitter scale is separated from the GameObject scale. But in the MSFT_audio_emitter spec, it does mention that audio attenuation should be calculated in the emitter space.

@UX3D-nopper
Copy link
Copy Markdown
Contributor

UX3D-nopper commented May 20, 2022

So in the case of maxDistance what should happen with it when scaled down to 10% of it's original scale?

maxDistance
The maximum distance between the emitter and listener, after which the volume will not be reduced any further. maximumDistance may only be applied when the distanceModel is set to linear. Otherwise, it should be ignored.

Should this respect node scale?

I suggest, that the behaviour is like in the KHR_lights_punctual extension:
https://github.com/KhronosGroup/glTF/tree/main/extensions/2.0/Khronos/KHR_lights_punctual
"The light's transform is affected by the node's world scale, but all properties of the light (such as range and intensity) are unaffected."
"Light properties are unaffected by node transforms — for example, range and intensity do not change with scale."

So, the final position of the audio is affected by scale, but not the properties of audio.

@hybridherbst
Copy link
Copy Markdown

This is unfortunately exactly what breaks all existing viewers in Augmented Reality mode, where users can "make the content smaller or bigger" without any agreement on what that means – is it the viewer getting bigger or the content getting smaller or a mix of both. See my comment above for some more descriptive cases.
"node's world scale" is well defined inside one glTF scene but isn't well defined when placing that scene in another context. There's an additional "scene scale" of sorts.

@UX3D-nopper
Copy link
Copy Markdown
Contributor

UX3D-nopper commented May 20, 2022

This is unfortunately exactly what breaks all existing viewers in Augmented Reality mode, where users can "make the content smaller or bigger" without any agreement on what that means – is it the viewer getting bigger or the content getting smaller or a mix of both. See my comment above for some more descriptive cases. "node's world scale" is well defined inside one glTF scene but isn't well defined when placing that scene in another context. There's an additional "scene scale" of sorts.

We should realy define this consistent inside glTF. A pure glTF viewer should behave like I described.
I got your point and your usecase is using glTF in another application. Not glTF is defining the behaviour, it is the AR viewer. And next time, you import the glTF in a game engine. And there, it is also different.

BTW, Blender behaves this way e.g you can try it out with a point light radius. It stays the same, independent of the scale.

@hybridherbst
Copy link
Copy Markdown

Personally I think this says "we should only care about what happens inside the ivory tower"...
I'm not sure what a "pure" glTF Viewer is, everything has a context.

I created a new issue to track this problem outside of just KHR_audio:

@donmccurdy
Copy link
Copy Markdown
Contributor

donmccurdy commented May 20, 2022

See #2137 (comment) — I think it is fine to try to define what you're asking for, but I do not think that requirement belongs in KHR_audio or KHR_lights_punctual. A specification that can properly deal with the ambiguities of novel and rapidly changing contexts is required to do what you suggest, and that will burden these extensions too much.

@hybridherbst
Copy link
Copy Markdown

hybridherbst commented May 20, 2022

Hm, I don't think it belongs tucked away in some unspecified metadata either. Otherwise a case could be made that 95% of extensions would be better suited as metadata.

If you read through the issue I opened, I explicitly mention there that

  • ideally the discussions and definitions around this belongs elsewhere (not into either KHR_audio or KHR_lights_punctual)
  • extensions should be encouraged to call out how they intend to behave
  • I don't believe this needs to be normative.

@robertlong
Copy link
Copy Markdown
Contributor Author

robertlong commented Jul 20, 2022

Hey everyone, we've been discussing this spec over the past couple months and we have a few changes that allow for some of the requested features above.

First is splitting audio data and audio sources. Audio Data is similar to an Image where it has the uri or bufferView and mimeType properties. Audio Sources are intended to be used for controlling playback state on one or more Audio Emitters. An Audio Emitter can also reference multiple Audio Sources which allows you to mix multiple audio tracks through an emitter.

We've also changed playing back to autoPlay which signifies that an audio source should play on load.

We'd love to get everyone's feedback on these changes!

Also, I believe we (The Matrix Foundation) have submitted our app to join Khronos. So I will hopefully be around to participate during the working group meetings in the near future. Hopefully that helps move this spec (and others) forward a little faster.

@mikeskydev
Copy link
Copy Markdown

I note KHR_audio_emitter is no longer mentioned on the roadmap https://github.com/KhronosGroup/glTF/tree/main/extensions#in-progress-khronos-and-multi-vendor-extensions-and-projects, and instead it's now KHR_audio_graph. is this PR superseded?

@aaronfranke
Copy link
Copy Markdown
Contributor

@mikeskydev OMI is maintaining the PR. We have enhanced it this year with a clearer spec, an example file, and an example implementation.

We are waiting on a response from Khronos: #2421 (review) (we don't have any communication channel with them other than GitHub).

@docEdub
Copy link
Copy Markdown

docEdub commented Dec 3, 2024

Has there been any discussion around adding a property to sources for playback speed? Something like pitch, playbackRate, detune, or some combination for coarse- or fine-grained modifications. This would make audio elements more reusable.

@robertdorn83
Copy link
Copy Markdown

Can you please enhanced the spec about what is controlling the playback? i only found this in a comment here. It's not clear enough if a audio source or the emitter is controlling this. It's importan for Interactivity :)
I'm not sure if this is the right place, but maybe also extend the spec with Interactivity Audio Nodes? Pointers already exist, so maybe it's also a good idea to add these.

@aaronfranke
Copy link
Copy Markdown
Contributor

@pfcDorn There is the autoPlay boolean, and the ability to loop audio, but otherwise how playback is controlled is not specified. There was discussion in the past of a playing boolean, but it was decided against, I don't remember why. Aside from that, for the ability for KHR_interactivity to play like a function call or receive events like a finished signal, I am still waiting on a response from Khronos to explain how this is supposed to work: #2293 (comment)

@docEdub
Copy link
Copy Markdown

docEdub commented Apr 8, 2025

Is there a strong reason why the "p" in autoPlay is capitalized? I think it's typically considered one word, so the "p" would be lowercase.

@robertdorn83
Copy link
Copy Markdown

Finished signal would be an Flow Output on a "audio/play" node (like on flow/setDelay the Done flow output). and as an value input for this node it could be an integer for the audiosource index/id, which should be played.

@aaronfranke
Copy link
Copy Markdown
Contributor

aaronfranke commented Apr 9, 2025

@docEdub I don't remember the particular reason. It was introduced in omigroup/gltf-extensions#88. We could switch the casing if that is desired. Godot uses all lowercase for its autoplay property, so it would be consistent there. EDIT: Done, we have switched the casing in omigroup/gltf-extensions#248

@pfcDorn I think a finished signal is not quite equivalent to a Flow Output on the node that plays, because there may be multiple points in the script that lead to playing audio. I would expect the Flow Output to only run after that specific node runs. Also, does this imply that audio/play would be a blocking operation? What if an interactivity script wants to start playback of multiple audio emitters in parallel? Anyway, would love to discuss this further, you are also welcome to submit a pull request to the OMI gltf-extensions repo if you have a good idea of what this will look like.

@robertdorn83
Copy link
Copy Markdown

an node can contain multiple flow outputs. So why not a normal "out" flow which gets triggered right after the playback starts, and a seconds out flow for "done".

@docEdub
Copy link
Copy Markdown

docEdub commented Apr 9, 2025

@docEdub I don't remember the particular reason. It was introduced in omigroup/gltf-extensions#88. We could switch the casing if that is desired. Godot uses all lowercase for its autoplay property, so it would be consistent there.

Yes, I think autoplay is best for consistency. The Babylon.js audio engine uses the lower case version, too. Thanks!

@coderofsalvation
Copy link
Copy Markdown

coderofsalvation commented May 12, 2025

great effort!
KHR_audio_graph looks nice on paper, but might also be more complex / implementation-heavy.
Question: is there any experimental THREEjs-code to play with KHR_audio_emitter?
I'd really love to encode/decode an mp3-file into a bufferview (any example of doing that via javascript?)

@aaronfranke
Copy link
Copy Markdown
Contributor

aaronfranke commented May 16, 2025

@coderofsalvation Yes! https://omigroup.github.io/three-omi/

KHR_audio_emitter is made with the Web Audio API in mind (used by Three.js inside its Audio objects), so it should be easy to implement. There is an ancient TypeScript Three.js implementation here, which has a lot of problems. So, I decided to spend the last few days re-implementing it in Three.js in simple JavaScript. You can view the source code here and try out a demo here: https://omigroup.github.io/three-omi/

Also, there is a Godot implementation and a Unity implementation.

As for exporting MP3 files into a bufferView, I think this is impossible unless bringing in an entire MP3 encoding library. A quick Google search revealed this: https://stackoverflow.com/questions/43501286/how-to-convert-an-audiobuffer-to-a-mp3-file

@coderofsalvation
Copy link
Copy Markdown

thanks this is very exciting news ❤
How did you end up getting the audiofile into boombox.glb?
Can you elaborate on how I/others can do that?

@aaronfranke
Copy link
Copy Markdown
Contributor

@coderofsalvation I used Godot Engine with this PR to generate the files.

@Shinmera
Copy link
Copy Markdown

It seems odd to me to choose mp3 as the only format, when mp3 has issues with gapless looping (due to chunk padding).

@aaronfranke
Copy link
Copy Markdown
Contributor

@Shinmera What is the alternative? The Khronos group will not accept WAV as part of KHR_audio_emitter or any other Khronos-approved glTF extension because WAV has no formal specification. There have also been discussions of Opus.

@Shinmera
Copy link
Copy Markdown

Both ogg/vorbis and ogg/opus (preferred over vorbis these days and has wide browser support at least) allow gapless looping. For a trivial, formally specified, uncompressed PCM format, SF3 exists though there's barely any existing support for it at this time.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 13, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ aaronfranke
❌ robertlong
❌ antpb
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.