Commit c5403669 authored by David Reid's avatar David Reid

Improve performance of ma_node_uninit().

This changes the way ma_node_uninit() waits for the audio thread. Now,
instead of waiting for the *entire* graph to complete, it only waits
for it's local processing to complete.
parent 61c6e5b1
...@@ -583,43 +583,39 @@ data from the graph. Locking in these areas are achieved by means of spinlocks. ...@@ -583,43 +583,39 @@ data from the graph. Locking in these areas are achieved by means of spinlocks.
The main complication with keeping `ma_node_graph_read_pcm_frames()` lock-free stems from the fact The main complication with keeping `ma_node_graph_read_pcm_frames()` lock-free stems from the fact
that a node can be uninitialized, and it's memory potentially freed, while in the middle of being that a node can be uninitialized, and it's memory potentially freed, while in the middle of being
processed by `ma_node_graph_read_pcm_frames()` on the audio thread. To solve this, uninitialization processed on the audio thread. There are times when the audio thread will be referencing a node,
of a node with `ma_node_uninit()` will spin until `ma_node_graph_read_pcm_frames()` finishes. This which means the uninitialization process of a node needs to make sure it delays returning until the
is the most significant compromise employed by miniaudio as it can, in the worst case, result in audio thread is finished so that control is not handed back to the caller thereby giving them a
`ma_node_uninit()` spinning for as long as it takes to process an entire period of audio. For chance to free the node's memory.
example, if `ma_node_graph_read_pcm_frames()` takes a millisecond to process, then
`ma_node_uninit()` will spin for 1 millisecond in the worst case. You should take this into When the audio thread is processing a node, it does so by reading from each of the output buses of
consideration when designing your node management strategy - you may want to have some kind of the node. In order for a node to process data for one of it's output buses, it needs to read from
threaded cleanup routines. each of it's input buses, and so on an so forth. It follows that once all output buses of a node
are detached, the node as a whole will be disconnected and no further processing will occur unless
It's intuitive to think that instead of waiting for the *entire* graph to finish reading it would it's output buses are reattached, which won't be happening when the node is being uninitialized.
be possible to just wait for the node itself to finish, thereby making it a bit more efficient to By having `ma_node_detach_output_bus()` wait until the audio thread is finished with it, we can
uninitialize the node. This doesn't work because of how nodes are connected to each other. Nodes simplify a few things, at the expense of making `ma_node_detach_output_bus()` a bit slower. By
are connected to other nodes, which means there's always hole when doing things on a local level. doing this, the implementation of `ma_node_uninit()` becomes trivial - just detach all output
A higher level structure that encompasses the entire node graph is required for things to work nodes, followed by each of the attachments to each of it's input nodes, and then do any final clean
without holes. This is where the `ma_node_graph` object comes into it. This object encompasses the up.
entire node graph which enables us to do a waiting system without holes.
With the above design, the worst-case scenario is `ma_node_detach_output_bus()` taking as long as
The `ma_node_graph` has two members that are of relevance for implementing the waiting mechanism in it takes to process the output bus being detached. This will happen if it's called at just the
`ma_node_uninit()`. One is a flag that keeps track of whether or not we are reading from the graph. wrong moment where the audio thread has just iterated it and has just started processing. The
That is, whether or not we are executing `ma_node_graph_read_pcm_frames()`. This assumes that this caller of `ma_node_detach_output_bus()` will stall until the audio thread is finished, which
function is never called simultaneously by multiple threads, which wouldn't actually make sense includes the cost of recursively processing it's inputs. This is the biggest compromise made with
anyway. The other variable is a counter which increments at the end of every call to the approach taken by miniaudio for it's lock-free processing system. The cost of detaching nodes
`ma_node_graph_read_pcm_frames()`. In `ma_node_uninit()`, the node is first detached from the node earlier in the pipeline (data sources, for example) will be cheaper than the cost of detaching
graph with `ma_node_detach()`. It is safe to detach the node from the graph if it's still in the higher level nodes, such as some kind of final post-processing endpoint. If you need to do mass
middle of processing, so long as the node remains valid. Once the node is detached, it can be detachments, detach starting from the lowest level nodes and work your way towards the final
guaranteed that future calls to `ma_node_graph_read_pcm_frames()` will never iterate over the node. endpoint node (but don't try detaching the node graph's endpoint). If the audio thread is not
Then, a copy of the current read counter is retrieved followed by a check against the flag that running, detachment will be fast and detachment in any order will be the same.
tracks whether or not a read is currently being performed. If so, the `ma_node_uninit()` will spin
until the counter we retrieved earlier does not equal the current counter. Once the counters do not
match, waiting is over and the uninitialization process can continue without needing to worry about
thread-safety. See the implementation of `ma_node_uninit()` for more details on this.
Another compromise, albeit less significant, is locking when attaching and detaching nodes. This Another compromise, albeit less significant, is locking when attaching and detaching nodes. This
locking is achieved by means of a spinlock in order to reduce overhead. A lock is present for each locking is achieved by means of a spinlock in order to reduce memory overhead. A lock is present
input bus and output bus, totaling 4 for each node. When an output bus is connected to an input for each input bus and output bus, totaling 4 for each node. When an output bus is connected to an
bus, both the output bus and input bus is locked. This locking is specifically for attaching and input bus, both the output bus and input bus is locked. This locking is specifically for attaching
detaching across different threads and does not affect `ma_node_graph_read_pcm_frames()` in any and detaching across different threads and does not affect `ma_node_graph_read_pcm_frames()` in any
way. The locking and unlocking is mostly self-explanatory, but slightly less intuitive part comes way. The locking and unlocking is mostly self-explanatory, but slightly less intuitive part comes
into it when considering that iterating over attachments must not break as a result of attaching or into it when considering that iterating over attachments must not break as a result of attaching or
detaching a node while iteration is occuring. detaching a node while iteration is occuring.
...@@ -656,16 +652,6 @@ same call to `ma_node_graph_read_pcm_frames()`. This is an unusual scenario and ...@@ -656,16 +652,6 @@ same call to `ma_node_graph_read_pcm_frames()`. This is an unusual scenario and
go unnoticed by the majority of people, but it's still an issue to consider. go unnoticed by the majority of people, but it's still an issue to consider.
*/ */
/*
Open Questions
==============
- What should the maximum bus count be for each node? This is defined at compile time and each node
has a fixed sized array of structs representing the input and output buses. It's therefore
important to keep this within reason. It's currently set to 2 for each side (maximum of 2 input
buses and 2 output buses). The limit can be increased with MA_MAX_NODE_BUS_COUNT, so I'm thinking
that this should be set to the maximum needed by the stock nodes and if an application needs more
to support their custom effects they can just increase it themselves.
*/
/* Must never exceed 255. */ /* Must never exceed 255. */
#ifndef MA_MAX_NODE_BUS_COUNT #ifndef MA_MAX_NODE_BUS_COUNT
...@@ -759,8 +745,9 @@ struct ma_node_output_bus ...@@ -759,8 +745,9 @@ struct ma_node_output_bus
/* Mutable via multiple threads. The weird ordering here is for packing reasons. */ /* Mutable via multiple threads. The weird ordering here is for packing reasons. */
volatile ma_uint8 inputNodeInputBusIndex; /* The index of the input bus on the input. Required for detaching. */ volatile ma_uint8 inputNodeInputBusIndex; /* The index of the input bus on the input. Required for detaching. */
volatile ma_uint8 flags; /* Some state flags for tracking the read state of the output buffer. */ volatile ma_uint8 flags; /* Some state flags for tracking the read state of the output buffer. */
volatile ma_uint16 refCount; /* Reference count for some thread-safety when detaching. */
volatile ma_bool8 isAttached; /* This is used to prevent iteration of nodes that are in the middle of being detached. Used for thread safety. */
volatile ma_spinlock lock; /* Unfortunate lock, but significantly simplifies the implementation. Required for thread-safe attaching and detaching. */ volatile ma_spinlock lock; /* Unfortunate lock, but significantly simplifies the implementation. Required for thread-safe attaching and detaching. */
ma_uint8 padding[3];
volatile float volume; /* Linear 0..1 */ volatile float volume; /* Linear 0..1 */
volatile ma_node_output_bus* pNext; /* If null, it's the tail node or detached. */ volatile ma_node_output_bus* pNext; /* If null, it's the tail node or detached. */
volatile ma_node_output_bus* pPrev; /* If null, it's the head node or detached. */ volatile ma_node_output_bus* pPrev; /* If null, it's the head node or detached. */
...@@ -775,12 +762,12 @@ typedef struct ma_node_input_bus ma_node_input_bus; ...@@ -775,12 +762,12 @@ typedef struct ma_node_input_bus ma_node_input_bus;
struct ma_node_input_bus struct ma_node_input_bus
{ {
/* Mutable via multiple threads. */ /* Mutable via multiple threads. */
volatile ma_node_output_bus* pHead; /* If null, the list is empty. */ ma_node_output_bus head; /* Dummy head node for simplifying some lock-free thread-safety stuff. */
volatile ma_spinlock lock; /* Unfortunate lock, but significantly simplifies the implementation. Required for thread-safe attaching and detaching. */ volatile ma_spinlock lock; /* Unfortunate lock, but significantly simplifies the implementation. Required for thread-safe attaching and detaching. */
volatile ma_uint16 nextCounter; /* This is used to determine whether or not the input bus is finding the next node in the list. Used for thread safety when detaching output buses. */
/* Set once at startup. */ /* Set once at startup. */
ma_uint8 channels; /* The number of channels in the audio stream for this bus. */ ma_uint8 channels; /* The number of channels in the audio stream for this bus. */
ma_uint8 padding[2];
}; };
...@@ -2792,6 +2779,17 @@ static ma_bool32 ma_node_output_bus_has_read(ma_node_output_bus* pOutputBus) ...@@ -2792,6 +2779,17 @@ static ma_bool32 ma_node_output_bus_has_read(ma_node_output_bus* pOutputBus)
} }
static void ma_node_output_bus_set_is_attached(ma_node_output_bus* pOutputBus, ma_bool8 isAttached)
{
c89atomic_exchange_8(&pOutputBus->isAttached, isAttached);
}
static ma_bool8 ma_node_output_bus_is_attached(ma_node_output_bus* pOutputBus)
{
return c89atomic_load_8(&pOutputBus->isAttached);
}
static ma_result ma_node_output_bus_set_volume(ma_node_output_bus* pOutputBus, float volume) static ma_result ma_node_output_bus_set_volume(ma_node_output_bus* pOutputBus, float volume)
{ {
MA_ASSERT(pOutputBus != NULL); MA_ASSERT(pOutputBus != NULL);
...@@ -2819,7 +2817,6 @@ static ma_result ma_node_input_bus_init(ma_uint32 channels, ma_node_input_bus* p ...@@ -2819,7 +2817,6 @@ static ma_result ma_node_input_bus_init(ma_uint32 channels, ma_node_input_bus* p
MA_ASSERT(channels < 256); MA_ASSERT(channels < 256);
MA_ZERO_OBJECT(pInputBus); MA_ZERO_OBJECT(pInputBus);
pInputBus->pHead = NULL;
pInputBus->channels = (ma_uint8)channels; pInputBus->channels = (ma_uint8)channels;
return MA_SUCCESS; return MA_SUCCESS;
...@@ -2836,6 +2833,22 @@ static void ma_node_input_bus_unlock(ma_node_input_bus* pInputBus) ...@@ -2836,6 +2833,22 @@ static void ma_node_input_bus_unlock(ma_node_input_bus* pInputBus)
} }
static void ma_node_input_bus_next_begin(ma_node_input_bus* pInputBus)
{
c89atomic_fetch_add_16(&pInputBus->nextCounter, 1);
}
static void ma_node_input_bus_next_end(ma_node_input_bus* pInputBus)
{
c89atomic_fetch_sub_16(&pInputBus->nextCounter, 1);
}
static ma_uint16 ma_node_input_bus_get_next_counter(ma_node_input_bus* pInputBus)
{
return c89atomic_load_16(&pInputBus->nextCounter);
}
static ma_uint32 ma_node_input_bus_get_channels(const ma_node_input_bus* pInputBus) static ma_uint32 ma_node_input_bus_get_channels(const ma_node_input_bus* pInputBus)
{ {
return pInputBus->channels; return pInputBus->channels;
...@@ -2847,6 +2860,12 @@ static void ma_node_input_bus_detach__no_output_bus_lock(ma_node_input_bus* pInp ...@@ -2847,6 +2860,12 @@ static void ma_node_input_bus_detach__no_output_bus_lock(ma_node_input_bus* pInp
MA_ASSERT(pInputBus != NULL); MA_ASSERT(pInputBus != NULL);
MA_ASSERT(pOutputBus != NULL); MA_ASSERT(pOutputBus != NULL);
/*
Mark the output bus as detached first. This will prevent future iterations on the audio thread
from iterating this output bus.
*/
ma_node_output_bus_set_is_attached(pOutputBus, MA_FALSE);
/* /*
We cannot use the output bus lock here since it'll be getting used at a higher level, but we do We cannot use the output bus lock here since it'll be getting used at a higher level, but we do
still need to use the input bus lock since we'll be updating pointers on two different output still need to use the input bus lock since we'll be updating pointers on two different output
...@@ -2878,6 +2897,35 @@ static void ma_node_input_bus_detach__no_output_bus_lock(ma_node_input_bus* pInp ...@@ -2878,6 +2897,35 @@ static void ma_node_input_bus_detach__no_output_bus_lock(ma_node_input_bus* pInp
c89atomic_exchange_ptr(&pOutputBus->pPrev, NULL); /* As above. */ c89atomic_exchange_ptr(&pOutputBus->pPrev, NULL); /* As above. */
pOutputBus->pInputNode = NULL; pOutputBus->pInputNode = NULL;
pOutputBus->inputNodeInputBusIndex = 0; pOutputBus->inputNodeInputBusIndex = 0;
/*
For thread-safety reasons, we don't want to be returning from this straight away. We need to
wait for the audio thread to finish with the output bus. There's two things we need to wait
for. The first is the part that selects the next output bus in the list, and the other is the
part that reads from the output bus. Basically all we're doing is waiting for the input bus
to stop referencing the output bus.
We're doing this part last because we want the section above to run while the audio thread
is finishing up with the output bus, just for efficiency reasons. We marked the output bus as
detached right at the top of this function which is going to prevent the audio thread from
iterating the output bus again.
*/
/* Part 1: Wait for the current iteration to complete. */
while (ma_node_input_bus_get_next_counter(pInputBus) > 0) {
ma_yield();
}
/* Part 2: Wait for any reads to complete. */
while (c89atomic_load_16(&pOutputBus->refCount) > 0) {
ma_yield();
}
/*
At this point we're done detaching and we can be guaranteed that the audio thread is not going
to attempt to reference this output bus again (until attached again).
*/
} }
#if 0 /* Not used at the moment, but leaving here in case I need it later. */ #if 0 /* Not used at the moment, but leaving here in case I need it later. */
...@@ -2931,14 +2979,14 @@ static void ma_node_input_bus_attach(ma_node_input_bus* pInputBus, ma_node_outpu ...@@ -2931,14 +2979,14 @@ static void ma_node_input_bus_attach(ma_node_input_bus* pInputBus, ma_node_outpu
ma_node_input_bus_lock(pInputBus); ma_node_input_bus_lock(pInputBus);
{ {
ma_node_output_bus* pNewPrev = NULL; ma_node_output_bus* pNewPrev = NULL;
ma_node_output_bus* pNewNext = (ma_node_output_bus*)c89atomic_load_ptr(&pInputBus->pHead); ma_node_output_bus* pNewNext = (ma_node_output_bus*)c89atomic_load_ptr(&pInputBus->head.pNext);
/* Update the local output bus. */ /* Update the local output bus. */
c89atomic_exchange_ptr(&pOutputBus->pPrev, pNewPrev); c89atomic_exchange_ptr(&pOutputBus->pPrev, pNewPrev);
c89atomic_exchange_ptr(&pOutputBus->pNext, pNewNext); c89atomic_exchange_ptr(&pOutputBus->pNext, pNewNext);
/* Update the other output buses to point back to the local output bus. */ /* Update the other output buses to point back to the local output bus. */
c89atomic_exchange_ptr(&pInputBus->pHead, pOutputBus); /* <-- This is where the output bus is actually attached to the input bus. */ c89atomic_exchange_ptr(&pInputBus->head.pNext, pOutputBus); /* <-- This is where the output bus is actually attached to the input bus. */
/* Do the previous pointer last. This is only used for detachment. */ /* Do the previous pointer last. This is only used for detachment. */
if (pNewNext != NULL) { if (pNewNext != NULL) {
...@@ -2946,24 +2994,59 @@ static void ma_node_input_bus_attach(ma_node_input_bus* pInputBus, ma_node_outpu ...@@ -2946,24 +2994,59 @@ static void ma_node_input_bus_attach(ma_node_input_bus* pInputBus, ma_node_outpu
} }
} }
ma_node_input_bus_unlock(pInputBus); ma_node_input_bus_unlock(pInputBus);
/*
Mark the node as attached last. This is used to controlling whether or the output bus will be
iterated on the audio thread. Mainly required for detachment purposes.
*/
ma_node_output_bus_set_is_attached(pOutputBus, MA_TRUE);
} }
ma_node_output_bus_unlock(pOutputBus); ma_node_output_bus_unlock(pOutputBus);
} }
static ma_node_output_bus* ma_node_input_bus_next(ma_node_input_bus* pInputBus, ma_node_output_bus* pOutputBus) static ma_node_output_bus* ma_node_input_bus_next(ma_node_input_bus* pInputBus, ma_node_output_bus* pOutputBus)
{ {
ma_node_output_bus* pNext;
MA_ASSERT(pInputBus != NULL); MA_ASSERT(pInputBus != NULL);
if (pOutputBus == NULL) { if (pOutputBus == NULL) {
return NULL; return NULL;
} }
return (ma_node_output_bus*)c89atomic_load_ptr(&pOutputBus->pNext); ma_node_input_bus_next_begin(pInputBus);
{
pNext = pOutputBus;
for (;;) {
pNext = (ma_node_output_bus*)c89atomic_load_ptr(&pNext->pNext);
if (pNext == NULL) {
break; /* Reached the end. */
}
if (ma_node_output_bus_is_attached(pNext) == MA_FALSE) {
continue; /* The node is not attached. Keep checking. */
}
/* The next node has been selected. */
break;
}
/* We need to increment the reference count of the selected node. */
if (pNext != NULL) {
c89atomic_fetch_add_16(&pNext->refCount, 1);
}
/* The previous node is no longer being referenced. */
c89atomic_fetch_sub_16(&pOutputBus->refCount, 1);
}
ma_node_input_bus_next_end(pInputBus);
return pNext;
} }
static ma_node_output_bus* ma_node_input_bus_first(ma_node_input_bus* pInputBus) static ma_node_output_bus* ma_node_input_bus_first(ma_node_input_bus* pInputBus)
{ {
return (ma_node_output_bus*)c89atomic_load_ptr(&pInputBus->pHead); return ma_node_input_bus_next(pInputBus, &pInputBus->head);
} }
...@@ -2972,6 +3055,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_ ...@@ -2972,6 +3055,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_
{ {
ma_result result = MA_SUCCESS; ma_result result = MA_SUCCESS;
ma_node_output_bus* pOutputBus; ma_node_output_bus* pOutputBus;
ma_node_output_bus* pFirst;
ma_uint32 inputChannels; ma_uint32 inputChannels;
/* /*
...@@ -2994,42 +3078,17 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_ ...@@ -2994,42 +3078,17 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_
inputChannels = ma_node_input_bus_get_channels(pInputBus); inputChannels = ma_node_input_bus_get_channels(pInputBus);
/* /*
A few optimizations here: We need to be careful with how we call ma_node_input_bus_first() and ma_node_input_bus_next(). They
are both critical to our lock-free thread-safety system. We can only call ma_node_input_bus_first()
1) If there's only a single attachment we just write directly to the output buffer. once per iteration, however we have an optimization to checks whether or not it's the first item in
2) For the first attachment, we do not mix. Instead we just overwrite. the list. We therefore need to store a pointer to the first item rather than repeatedly calling
ma_node_input_bus_first(). It's safe to keep hold of this point, so long as we don't dereference it
after calling ma_node_input_bus_next(), which we won't be.
*/ */
if (ma_node_input_bus_next(pInputBus, ma_node_input_bus_first(pInputBus)) == NULL) { pFirst = ma_node_input_bus_first(pInputBus);
/* Fast path. There's only one attachment Read straight into the output buffer. */
pOutputBus = ma_node_input_bus_first(pInputBus);
if (pOutputBus != NULL) {
/* Fast path. Input and output channel counts are the same. No conversion necessary. */
ma_uint32 framesRead;
float volume = ma_node_output_bus_get_volume(pOutputBus);
if (volume > 0) {
result = ma_node_read_pcm_frames(pOutputBus->pNode, pOutputBus->outputBusIndex, pFramesOut, frameCount, &framesRead);
} else {
result = ma_node_read_pcm_frames(pOutputBus->pNode, pOutputBus->outputBusIndex, NULL, frameCount, &framesRead); /* Volume is muted. This resolves to a seek. */
}
if (result == MA_SUCCESS) {
/* Any frames that were not read need to be filled with silence. */
if (framesRead < frameCount) {
ma_silence_pcm_frames(ma_offset_pcm_frames_ptr(pFramesOut, framesRead, ma_format_f32, inputChannels), (frameCount - framesRead), ma_format_f32, inputChannels);
}
}
/* Apply volume, if necessary. */
if (volume > 0 && volume != 1) {
ma_apply_volume_factor_f32(pFramesOut, framesRead * inputChannels, volume);
}
*pFramesRead = framesRead;
}
} else {
/* Slow path. There's multiple output buses or channel conversion is required. We need to mix before outputting. pFramesOut is the accumulation buffer. */ /* Slow path. There's multiple output buses or channel conversion is required. We need to mix before outputting. pFramesOut is the accumulation buffer. */
for (pOutputBus = ma_node_input_bus_first(pInputBus); pOutputBus != NULL; pOutputBus = ma_node_input_bus_next(pInputBus, pOutputBus)) { for (pOutputBus = pFirst; pOutputBus != NULL; pOutputBus = ma_node_input_bus_next(pInputBus, pOutputBus)) {
ma_uint32 framesProcessed = 0; ma_uint32 framesProcessed = 0;
MA_ASSERT(pOutputBus->pNode != NULL); MA_ASSERT(pOutputBus->pNode != NULL);
...@@ -3052,7 +3111,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_ ...@@ -3052,7 +3111,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_
pRunningFramesOut = ma_offset_pcm_frames_ptr_f32(pFramesOut, framesProcessed, inputChannels); pRunningFramesOut = ma_offset_pcm_frames_ptr_f32(pFramesOut, framesProcessed, inputChannels);
if (pOutputBus == ma_node_input_bus_first(pInputBus)) { if (pOutputBus == pFirst) {
/* Fast path. First attachment. We just read straight into the output buffer (no mixing required). */ /* Fast path. First attachment. We just read straight into the output buffer (no mixing required). */
result = ma_node_read_pcm_frames(pOutputBus->pNode, pOutputBus->outputBusIndex, pRunningFramesOut, framesToRead, &framesJustRead); result = ma_node_read_pcm_frames(pOutputBus->pNode, pOutputBus->outputBusIndex, pRunningFramesOut, framesToRead, &framesJustRead);
} else { } else {
...@@ -3072,7 +3131,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_ ...@@ -3072,7 +3131,7 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_
} }
/* If it's the first attachment we didn't do any mixing. Any leftover samples need to be silenced. */ /* If it's the first attachment we didn't do any mixing. Any leftover samples need to be silenced. */
if (pOutputBus == ma_node_input_bus_first(pInputBus) && framesProcessed < frameCount) { if (pOutputBus == pFirst && framesProcessed < frameCount) {
ma_silence_pcm_frames(ma_offset_pcm_frames_ptr(pFramesOut, framesProcessed, ma_format_f32, inputChannels), (frameCount - framesProcessed), ma_format_f32, inputChannels); ma_silence_pcm_frames(ma_offset_pcm_frames_ptr(pFramesOut, framesProcessed, ma_format_f32, inputChannels), (frameCount - framesProcessed), ma_format_f32, inputChannels);
} }
...@@ -3088,7 +3147,6 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_ ...@@ -3088,7 +3147,6 @@ static ma_result ma_node_input_bus_read_pcm_frames(ma_node* pInputNode, ma_node_
/* In this path we always "process" the entire amount. */ /* In this path we always "process" the entire amount. */
*pFramesRead = frameCount; *pFramesRead = frameCount;
}
return result; return result;
} }
...@@ -3248,7 +3306,6 @@ MA_API ma_result ma_node_init(ma_node_graph* pNodeGraph, const ma_node_config* p ...@@ -3248,7 +3306,6 @@ MA_API ma_result ma_node_init(ma_node_graph* pNodeGraph, const ma_node_config* p
MA_API void ma_node_uninit(ma_node* pNode, const ma_allocation_callbacks* pAllocationCallbacks) MA_API void ma_node_uninit(ma_node* pNode, const ma_allocation_callbacks* pAllocationCallbacks)
{ {
ma_node_base* pNodeBase = (ma_node_base*)pNode; ma_node_base* pNodeBase = (ma_node_base*)pNode;
volatile ma_uint32 readCounter; /* <-- Marking this as volatile just to make it clear that a copy of this variable must be made, and the compiler must not optimize it away. */
if (pNodeBase == NULL) { if (pNodeBase == NULL) {
return; return;
...@@ -3256,50 +3313,12 @@ MA_API void ma_node_uninit(ma_node* pNode, const ma_allocation_callbacks* pAlloc ...@@ -3256,50 +3313,12 @@ MA_API void ma_node_uninit(ma_node* pNode, const ma_allocation_callbacks* pAlloc
/* /*
The first thing we need to do is fully detach the node. This will detach all inputs and The first thing we need to do is fully detach the node. This will detach all inputs and
outputs. We need to do this first because the next section, where we wait for any current reads outputs. We need to do this first because it will sever the connection with the node graph and
to complete, depends on the property of this node being guaranteed not to be enumerated in any allow us to complete uninitialization without needing to worry about thread-safety with the
future reads that might happen on another thread while this function is running. audio thread. The detachment process will wait for any local processing of the node to finish.
There's a possibility that the node will be in the middle of a read at this point. This is OK
because thread-safety on detachment is handled for this case.
*/ */
ma_node_detach(pNode); ma_node_detach(pNode);
/*
Now that the node has been detached need to wait for any active reads to complete. If we don't
do this, the caller may end up freeing memory allocating for this node while the node graph is
in the middle of reading.
If at this point the audio thread calls `ma_node_graph_read_pcm_frames()`, we will be safe
because we've already detached the node.
*/
/*
We need to retrieve the read counter *before* checking whether or not a read is in progress.
*/
readCounter = ma_node_graph_get_read_counter(pNodeBase->pNodeGraph); /* <-- This copy must happen. */
c89atomic_compiler_fence(); /* <-- We must extract the read counter before checking ma_node_graph_is_reading(). */
/*
Once we've got a local copy of the read counter we can check whether or not the node graph is
in the middle of a read. If it is, we need to wait for it to complete. This is where the read
counter we retrieved just above comes into it. Every time a read completes, the counter is
incremented. While the read counter is equal to the counter we just extracted above, the read
is still happening.
If we were to not check if the graph is currently reading (`ma_node_graph_is_reading()`) we'd
end up stuck in an infinite loop because the counter would never end up incrementing. This is
why we need to extract the read counter before performing this check. If we were to do the
reading check first, it would be possible to end up in a situation where the reading process
completes *just* before we extract the counter and enter the wait loop, thereby creating an
infinite loop.
*/
if (ma_node_graph_is_reading(pNodeBase->pNodeGraph)) {
while (readCounter == ma_node_graph_get_read_counter(pNodeBase->pNodeGraph)) {
ma_yield();
}
}
/* /*
At this point the node should be completely unreferenced by the node graph and we can finish up At this point the node should be completely unreferenced by the node graph and we can finish up
the uninitialization process without needing to worry about thread-safety. the uninitialization process without needing to worry about thread-safety.
...@@ -3403,15 +3422,25 @@ MA_API ma_result ma_node_detach(ma_node* pNode) ...@@ -3403,15 +3422,25 @@ MA_API ma_result ma_node_detach(ma_node* pNode)
ma_node_detach_output_bus(pNode, iOutputBus); ma_node_detach_output_bus(pNode, iOutputBus);
} }
/* Any nodes attached to this node as an input need to be detached as well. */ /*
At this point all output buses will have been detached from the graph and we can be guaranteed
that none of it's input nodes will be getting processed by the graph. We can detach these
without needing to worry about the audio thread touching them.
*/
for (iInputBus = 0; iInputBus < ma_node_get_input_bus_count(pNode); iInputBus += 1) { for (iInputBus = 0; iInputBus < ma_node_get_input_bus_count(pNode); iInputBus += 1) {
ma_node_input_bus* pInputBus; ma_node_input_bus* pInputBus;
ma_node_output_bus* pOutputBus; ma_node_output_bus* pOutputBus;
pInputBus = &pNodeBase->inputBuses[iInputBus]; pInputBus = &pNodeBase->inputBuses[iInputBus];
for (pOutputBus = ma_node_input_bus_first(pInputBus); pOutputBus != NULL; pOutputBus = ma_node_input_bus_next(pInputBus, pOutputBus)) { /*
ma_node_detach_output_bus(pOutputBus->pNode, pOutputBus->outputBusIndex); This is important. We cannot be using ma_node_input_bus_first() or ma_node_input_bus_next(). Those
functions are specifically for the audio thread. We'll instead just manually iterate using standard
linked list logic. We don't need to worry about the audio thread referencing these because the step
above severed the connection to the graph.
*/
for (pOutputBus = (ma_node_output_bus*)c89atomic_load_ptr(&pInputBus->head.pNext); pOutputBus != NULL; pOutputBus = (ma_node_output_bus*)c89atomic_load_ptr(&pOutputBus->pNext)) {
ma_node_detach_output_bus(pOutputBus->pNode, pOutputBus->outputBusIndex); /* This won't do any waiting in practice and should be efficient. */
} }
} }
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment