Optimization and Node Execution
  • I have a quick question about node execution: does every node get updated on every update cycle, or does Audulus evaluate the state of triggers or some other value to determine if nodes need to be evaluated? For example, I'm looking at simple examples like afta8's awesome sequencer subpatch (which is maybe not so simple!), which does a comparison for equality of the current counter value against every possible step (1 thru 16). I was imagining that a b-tree would be faster (4 comparisons versus 16), not just because it would require fewer comparisons to find the right step, but because it could potentially eliminate calculations of all further downstream nodes. But this improvement would only be true if nodes are conditionally executed, such that taking a valid branch eliminates all calculations on the other branch. I could certainly imagine some very cool designs where a subpatch had an enable bit: if 1, execute the subpatch. If 0, hold last output. We could then build more complex structures which are only executed on occasion, rather than every cycle.

    I realize that on modern hardware this may be silly, since 16 comparisons of equality are trivial, but we'll always find a way to max out the CPU of whatever is at hand! The above noted sequencer multiplies all input values by 0 except the one active step, but I could easily imagine more complex behavior where a bunch of work is done, then multiplied by 0, eating cycles.

    -John In Boston

    PS - Audulus rocks, but you already knew that! Just added it to my Mac after noodling on the iPad version for a couple of months.
  • Hey John,

    Welcome to the forum! Thanks for the kind words!

    Yes, every node gets updated every cycle, but nodes take shortcuts when values aren't changing (a node can ask if an input is constant over the whole cycle).

    The problem with the case where an expensive result is zeroed out is that there may be state involved in computing the result (an oscillator phase, reverb tail, etc), so in general I can't shortcut that sort of computation. However it would be cool to come up with a way to shortcut computation for stateless nodes.

    I've also been working on optimizations that help the worst-case scenario of everything changing all the time: extensive use of SIMD instructions and reducing the sample rate where appropriate.

    Your enable bit suggestion is an excellent idea! That would be quite elegant. Also, if you have any ideas for other evaluation methods, I'm all ears :-)

    - Taylor
  • Thanks for the speedy reply. This makes sense that you can't know whether some calculation may be needed in the future if a gate goes from off to on.

    By update cycle, I assume you mean the standard audio processing block size? I think I recall reading in another thread here with respect to feedback that there is no single sample delay block because every thing is updated 128 (or 256 or some other power of 2) samples at a time, so even though I see patches with outputs wired back around to inputs (usually through an Add, but not always) what's really happening is that 128 output samples are calculated, and this vector is fed into the next update block as 128 inputs to the add, resulting in a 128/44100 sec latency. And you're saying above that if the 128 inputs are constant, the node only has to do the math once. Is this correct?

    I'm curious because I was looking into implementing a non-linear version of the Moog ladder filter that I found in a paper, but this would require several single sample feedback paths which my Matlab prototype suggested were rather essential to properly capture the screaming self-oscillation! But I digress...
  • That's exactly right, feedback loops result in a delay of 128/44100. And yes, if the 128 samples are constant (well, more precisely, if the constant flag is set), then the node generally just does a calculation on that constant value.

    I'm actually working on single-sample feedback -- you'll be able to set the desired delay on a feedback connection. Its going to take me a while though, because I have to do several other optimizations to get there.

    - Taylor