-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDF optimisation related question, not issue #37
Comments
Thanks for the questions, and for your original work on WDF++! The reason for The Thanks, |
With a first quick approach of the problem, a partial implementation (only elements needed for the
The same test compiled with the Microsoft Visual Studio 2019 compiler (version: 16.10.0)
So, the first impression is that the compiler manage a better code that using (force) inlining with some compile-time optimizations. And by essence of the simplicity of the code, the difference between the internal VS2019 compiler and the LLVM-Clang extension is less pronouced than in more complex project. I read your answer right now. I do not have use SIMD optimization, nor in your implementation, nor in mine. So you're right, casting can be a bottleneck but i'm not sure of that, particulary using constexpr that is compile-time optimization. About the usage of constexpr, it's simple. I have replace all the inline function by constexpr and let the compiler optimize the code where it can be optimized. The only methods that are not clearly candidate for constexpr are the constructors, setters and the parenting methods (aka Yes, the constexpr technique is promising in particular in such of architecture, you can see clearly that the difference is notable. I will now attempt to modify your implementation ( Friendly, |
OK, so the constexpr modification on your implementation (without SIMD) do not provide great improvement. This is due to the fact that your Compiled same way with LLVM-clang:
So, I will investigate about a new architecture to optimize some redundances and code duplication. Even if it's not heavy weight for memory, the fact that using macro to embbed the members this way it's clearly a open-door to side-effect. Today with latest C++ version we can avoid macros and hidden code (SIMD) by using compile-time inclusion tricks (do not need private 'internal' sub-methods (ex. But because most of the changes will need recent compiler, it's better I think to work on a third version of the header. Oh yes, I just want to say to you. What a good work, love your plugins and the global quality of your work. It's a rare repo that impress me so much !!! |
Oh wow, this is very interesting! First off, I didn't know that Yeah, I definitely wasn't super happy with using a macro for the internal variables, but couldn't think of a better way to do it at the time. If you have a better solution I'd be happy to merge that. Thanks so much for the kind words, and for appreciating this project! Feel free to make a pull request with whatever changes you have. |
Your welcome. Yes, constexpr is a more deepfull concept than inlining that produce not only a faster code but potentially a better executable size. Sometimes the constexpr usage is parazited with some obscure metaprogramming concepts, but it's a great feature for modern code. Mixed with some type traits concepts you can do some great optimizations for high-level users. For example, a little trick just for the idea: template <typename T>
class Base
{
[...]
typedef T BaseT;
}; template <typename T>
class Resistor : public Base<T>
{
[...]
};
template <typename T>
class Capacitor : public Base<T>
{
[...]
}; template <typename P1, typename P2,
typename T = typename std::common_type<typename P1::BaseT,
typename P2::BaseT>::type>
class Parallel
{
[...]
}; The idea is to use promote-type-like technique to avoid type declaration at user level. void wiring()
{
using namespace wdf;
auto R1 = Resistor { 300.0 };
auto C1 = Capacitor { 1.0e-6 };
auto P1 = Parallel { R1, C1 };
} Of course a I need to investigate the Last year i've do a C++ port of the ACME.jl (Analog Circuit Modeling and Emulation for Julia) that is based on the 2015 paper : "Automatic Decomposition of Non-Linear Equation Systems in Audio Effect Circuit Simulation". The interrest of ACME is that you can build an run a circuit with multiple non-linearities without the constraints of the basic WDF (single non-linearity at root). Even if the project is well-done and offer some great implementation of circuits elements (ex. a Gummel-Poon model of BJT), the project is not really open to become part of a real-time audio framework. I've got good performance result using the Fastor library by a automatic generation of C++ solver as you can read here : HSU-ANT/ACME.jl#28 ... But I think it's possible to get better performance with the WDF concept using the method introduce by Bernardini & Wener ("Multi-Port NonLinearities in Wave Digital Structures"). Some years ago, I've talk with Maximilian Rest about the RT-WDF project that seem born dead (no activity). The code is heavy and the most important part is not managed in the framework : decompose a simple circuit into classic WDF branchs and R-Type for the non-linearities. The concept is not greatly detailled in the papers but it seem follow the procedure : Your WDF toolkit is great and powerful, but I can be more usefull with a third-party generator that produce ready-to-use optimized source code with nl-solver code auto-generated. The idea is not to provide a full SPICE-like tool, but just a helper to generate at least some cool little circuits : tonestack, triode amplifier, envelope follower ... some little effect circuits like a treble-booster, a wah, a fuzz. That needs to add some other classic elements : BJT, JFET, OPA ... For now, I stay focus on my prime idea : provide a third (experimental) header with some improvement of your already good code. Let me see that this week. |
Ah yeah, I'll definitely have to experiment with transitioning some more of my Very cool that SIMD is being added to the standard library! I'll have to give that a try soon. That said, the Thanks, |
Yes you're right (issues with Ofcourse i'm interested by the WDF subject, but i'm not an expert, even in coding. However I think I'm passionate enough about the subject to be of good advice, well I hope. Yes, generating code is something trivial, and once the method is found to build the decomposed matrices, I already suggest to use Fastor as backend for the nl-solver. The benchmark found here (https://github.com/romeric/Fastor/wiki/Benchmarks) show that is the perfect candidate to outperform all the others linear algebra framework in a static way (compile-time fixed matrices). Sincerely, |
I have read the 2016 dissertation of Kurt James Wener, the one cited in reference in your code. So the keypoint is the SPQR Tree. Problem, there's no standalone fast C++ implementation of this graph decomposition method. The only implementations are inside the sage-math framework (heavy installer, 900Mo, 2 hours fo install the thousands and thousands files on my computer). And another one, in the Open Graph Drawing Framework (OGDF). SAGE : python for the SPQR Tree and Graph implementation, Graph backend using C++ boost library (graph framework). It's bad to have to connect to third-party library but it seem that the SPQR conversion is not really easy to code. Netlist -> Graph : Trivial Maybe the deal will be to code the converter in Python and in this way to be able to use both Netlists input (txt-like file) or PySPICE directly to generate circuits on-the-fly (no schematic editor). I will look if there's some MNA solution in Python, but this is not the hardest part and can be coded for the specific case. The C++ auto-generation can be made à la Guitarix by using template based cpp/h models. What is the current state in our Discord? I don't want to interfere with a potential hidden development. I've just a doubt ? You only use the Sincerely, |
Yes! So the current implementation in this library is just performing the scattering matrix operation for linear non-adaptable elements. I have a couple rough examples of the I think the actual generation of the SPQR tree is a little bit beyond the scope of this library. I've had a couple conversations with folks recently who have been working on implementing this type of thing using using languages like Julia, MATLAB, and Mathematica (apparently the best for symbolically solving matrix equations). There's been some experimenting with Python as well, but symbolically solving the matrix equations in Python seems to be unbearably slow. One idea that we had was to try making a little web interface were someone could upload their circuit net-list, and everything else would get generated from there. Anyway, still a lot of work to do on that front. I've contacted the admin of the Discord channel, so hopefully we can add you there soon! Thanks, |
Cool.
No it's sure, But I say using a python script to do the following steps:
Maybe i've misunderstood something, but I think this is the idea. So, the only auto-generated C++ files are those who includes static matrices from the Modified-Nodal Analysis that calls the (need to be written) multi-dimensional Newton-Raphson solvers in real-time process. Or maybe by pre-generated LUT from differents NL elements for a high speed process. PS : I think the solution for the internal instanciation of adaptors (Parallel / Series) from the previously trick can be done with this method. Just use the variadic template with a perfect forward using reference wrapper and delegate the instanciation to a private specialized constructor: template <typename T>
class Parallel final : public Base
{
public:
template <typename ...Args>
Parallel(Args&&... args)
: Parallel(std::ref(std::forward<Args>(args))...)
{}
private:
Parallel(std::reference_wrapper<Base> port1,
std::reference_wrapper<Base> port2)
: p1(port1)
, p2(port2)
{
// do what you do
}
public:
std::reference_wrapper<Base> p1;
std::reference_wrapper<Base> p2;
}; With this method, the only template needed is the type, example : template <typename T>
class MyWDFCircuit
{
public:
[...]
private:
using namespace wdf::etc;
Capacitor<T> C1 { T(250e-12) };
Capacitor<T> C2 { T(20e-9) };
Resistor<T> R1 { T(250e3) };
Resistor<T> R2 { T(1e6) };
Series<T> S1 { C1, R1 };
Parallel<T> P1 { S1, R2 };
}; I've tested the idea, that works with virtual methods on the base class and by the nature of the constexpr in the methods the reference_wrapper will not affect the performances. But that need to be tested deepfully to be sure. Maybe the variadic template method is more powerfull than the tuple expansion in the |
Just a little thought on the matter of the class naming for the WDF project. Maybe it's not your case, but I do like using namespacing to classify the differents classes, hide thoses who are internal and allow special case with same naming of class that works in different context. Let me explain the idea. Little theorical example of specifics models in a futur context of non-linearity solving: auto model = wdf::nl::model::triode::Dempwolf<float>(); or: auto model = wdf::nl::model::bjt::GummelPoon<float, npn>(); or in the same non-linear namespace, the (potentialy multiple) solvers auto solver = wdf::nl::solver::NewtonRaphson<float>(); Because model & solver had potentialy a base class, the common practice is to hide the base class for end-user (high-level) by using a
Another method is to put all the base classes in a single details namespace at root namespace be keep the 'folders' clean. OK, in the context of the Wave Digital Filter theory, we have elements that can be adapted and unadapted.
The navigation under the framework is cleanest using this kind of organization I think. You can extend the concept by split the headers in a same way and by using a single
Maybe that answer your quest of the good name. Global rules :
This is just a suggestion, but I think it's the good pratice to keep the framework clean to use and develop. |
Hi MaxC2 |
@MaxC2, very interesting, I'm not really familiar with I like your class naming and organization ideas too, definitely a lot more readable and organized than what I currently have. I'll take some time this weekend to re-work the code organization. I'll have to think about whether I want to keep the templated implementations in the same header files as the polymorphic ones.... Thanks, |
Hum. Your current implementation is already fast, and changes can affect positively or negatively the overall performance. That's why I think it's better to work to a third implementation in a development branch, keeping the current version in the same state and process to a benchmark when the third implementation will be written. This is not a big task, most of the code keep in place, this is more a like a structural evolution than optimization. It's a point of view. Something it's better to restart from a clean paper in particular when the project is something relatively not complex. But it's your code, your repo, and you decide ;) Today i've write a The C++ netlist parser can easily be rewritte in any scripting language needed for the final use. I use generic and evolutive technique to do the job, example mapper to the elements parser : std::map<std::string, std::function<void(std::string, Format&)>> components =
{
{
"R", [=](std::string line, Format& format)
{
auto&& [label, n1, n2, value] = tokenize_linear_component(line);
format.components.push_back(std::make_unique<Resistor>(label, n1, n2, value));
}
},
{
"C", [=](std::string line, Format& format)
{
auto&& [label, n1, n2, value] = tokenize_linear_component(line);
format.components.push_back(std::make_unique<Capacitor>(label, n1, n2, value));
}
},
[...] The Netlist is cleaned by a preprocessor to extract blank lines, comments ( static Format parse(stringarray& lines)
{
Format format; // netlist file format structure
// The title is a special type of comment and
// it is always the first line in the file.
format.title = lines.front();
auto [newlines, subcircuits] = preprocessor(lines);
if (lower(command_name(newlines.back())) != "end")
throw std::exception("netlist parse: last line need to be a 'end' circuit statement");
for (size_t i = 0; i < newlines.size(); ++i)
{
auto line = newlines[i];
auto first = upper(line.substr(0, 1));
// is this a component definition?
if (components.contains(first))
components.at(first)(line, format);
// is this a command definition?
if (first == ".")
{
[...] So, my idea is to follow the Wener 2016 dissertation and attempt to get same results on same examples (ex. tone stack). Once the procedure will be correct and validated, we can easily thinking about the type of auto-generator we want and the language to use. Ah yes, one of the great feature to implement on the WDF framework is the magnitude response of the circuit. Wener explain the procedure to do that. This kind of feature is very useful to check the validity of a circuit and produce comparison with the SPICE simulation (or other circuit simulator). |
I guess I was more curious about how The netlist parsing definitely looks promising! I have some code for the Bassman tone stack lying around somewhere (with the SPQR tree derived by hand), so I'd be curious to compare that with whatever is generated from the netlist parser...
Could you point me to where in Kurt's paper he discusses this? I agree, that would be incredibly useful! Thanks, |
Oh, just see your answer (github do not show me the notifications anymore, I need to see that)
A synopsis of implementation is available here , so the reference wrapper class seem to hold a pointer. I hope the constexpr compatible definition allow the compiler to produce code without overhead. I think that is something related to memory management : on the stack (reference), on the heap (pointer) where the stack is somewhere more faster (but also more limited ... remember, i'm sure you've already experiment a stack overflow with some kind of buggy recursive code :) like me ). A benefit of Managing the assembly code is perfect, but my assembly knowledge is limited to 8-bits Z80 processor :)
Hum, need to check that. But if I remember the idea, this is not really a pure internal WDF method but a technic to inject a signal into the structure, so something that can be a pure functional method in a utility namespace for example. In : Wave Digital Filter Adaptors for Arbitrary Topologies and Multiport Linear Elements (DAFX-15) PDF Citation:
The full procedure is detailled in a Werner paper but need to find the correct reference. Let me see that. Sincerely, |
Hi Max, Thanks for sharing the reference on Now for the polymorphic API (in Thanks too for tracking down Kurt's method for findingthe frequency response. This approach definitely makes sense, although I was kind of hoping to find something that could find the magnitude/phase response at a given frequency just by inspection, rather than by analyzing a signal... my guess is that such a thing does not exist (yet), otherwise, Kurt would be the one to know :). Even still, using the method as described is definitely a useful utility to have along with this library. Thanks, |
Thank you very much for the work on this great framework, wonderful ! The benchmark from
jatinchowdhury18
repo (wdf-bakeoff
) see excellent result. Great job. I'm the guy who have code the Juce's WDF++ header few years ago.I've just a quick question about the
wdf_t.h
? Is the using of macro'ish alias techniqueSampleTypeHelpers
to access to generic parameters (duplicated into subclasses), instead of using base class protected variables? Is this motived by compiler optimisation and related to the fact of the polymorphism fails on performance (cf. benchmark)?About optimisation, I'm interested to see if constexpr technique (instead inlining that is old-school and not really pertinent with modern compilers) can allow to enhance the result. For instance, C++20 allow virtual on constexpr (ex. the impedance calculation).
In all case, thank you very much for your work !!!
The text was updated successfully, but these errors were encountered: