-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin merging production changes #197
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wrong button... Trying to abandon the "review".
X, Y = np.meshgrid(xp, yp) | ||
DEFAULT_VEL_PROBES = [(x, y) for x, y in zip(X.ravel(), Y.ravel())] | ||
|
||
# Pressure probes (spaced equally around the cylinder) | ||
RADIUS = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RADIUS
does not seem to be used anymore. It was used by previous pressure probes definition.
self.initialize_state() | ||
|
||
self.reset() | ||
|
||
if config.get("restart"): | ||
self.load_checkpoint(config["restart"]) | ||
self.load_checkpoint(config["restart"][0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the checkpoint is an iterable now ?
@@ -391,6 +489,7 @@ def reset(self, t=0.0) -> Union[ArrayLike, Tuple[ArrayLike, dict]]: | |||
self.flow.reset(q0=self.q0, t=t) | |||
self.solver.reset() | |||
|
|||
# Previously: return self.flow.get_observations(), info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master
still return self.flow.get_observations()
only too.
(Linked to the question of the transition to Gymnasium )
@@ -242,7 +242,7 @@ def cyl_velocity_field(self): | |||
theta_lo = -0.5 * pi | |||
A_lo = ufl.conditional( | |||
abs(theta - theta_lo) < omega / 2, | |||
pi / (2 * omega * self.rad**2) * ufl.cos( | |||
-pi / (2 * omega * self.rad**2) * ufl.cos( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical fix that has not been considered in #195 ? @jcallaham
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, did I get this wrong? I think this is separate from the fixes in #195
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jcallaham , if you have a moment to sort this out I would be very grateful since I am writing my PhD thesis and I am about to launch experiments.
Thank you:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comparing with Appendix B in the referenced paper and it looks to me like the original was correct. @cl126162 @ludgerpaehler can anyone elaborate on this change? I implemented this as zero net mass flux, which is fairly typical of CFD actuation models, though that reference actually implemented as two independent actuators, which is not really great in terms of boundary conditions and physicality. I guess changing the sign would make it so that both jets are in blowing or suction at once, rather than in opposition (as in the ZNMF configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you very much @jcallaham :)!
@@ -13,6 +13,36 @@ | |||
|
|||
|
|||
class Pinball(FlowConfig): | |||
rad = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rad
is already defined below, it could be good to pass an unused variables check (shipped with standard static checkers)
@@ -292,6 +298,43 @@ def solve( | |||
PDEBase: The state of the PDE at the end of the solve | |||
""" | |||
for iter, t in enumerate(np.arange(*t_span, self.dt)): | |||
iter = iter + start_iteration_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like solve
and solve_multistep
do almost the same thing. For a "public" release It may be better to merge them.:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - what's the distinction supposed to be here?
assert self.rewardLogCallback is not None,\ | ||
f"Error: If num_sim_substeps_per_actuation ({self.num_sim_substeps_per_actuation}) " \ | ||
"is set a reward callback function must be given, {self.rewardLogCallback}" | ||
self.reward_aggreation_rule = env_config.get("actuation_config", {}).get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of the reward_aggregation_rule
for a gym
interface where only instantaneous rewards/costs are considered? Is it the aggregations of the values in between decision times (times where the user hits step
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(there is a typo in the variable name reward_aggreation_rule
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this all happen in some kind of Wrapper? All of this seems to add a lot of complexity to an otherwise straightforward class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrappers can deal with those aggregation rules and also action scaling.
MAX_CONTROL = 0.1 | ||
MAX_CONTROL_LOW = 0.0 | ||
MAX_CONTROL_UP = 0.03 | ||
CONTROL_SCALING = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does CONTROL_SCALING
now introduces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer: It is a multiplicating scaling of the input action used in the step
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this used? Why wouldn't you just scale the input action externally to the physics model of the flow?
t = self.iter * self.solver.dt | ||
action = action * self.flow.CONTROL_SCALING | ||
|
||
if self.num_sim_substeps_per_actuation is not None and self.num_sim_substeps_per_actuation > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly num_sim_substeps_per_actuation
allows stretching the duration of the actions uniformly (actions given to the damped controller are piecewise constant). For this piecewise action signal, num_sim_substeps_per_actuation
adjusts the length of the intervals on which the action is constant?
@@ -11,6 +11,18 @@ | |||
|
|||
|
|||
class Cavity(FlowConfig): | |||
# Velocity probes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probs are used now and the old setting of A. Barbagallo, D. Sipp, P. J. Schmid - Closed-loop control of an open cavity flow using reduced-order models (2014) is abandoned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's still supported (as I think it should be), but not the default measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, probes can be set optionally and the old setting can still be used.
self.iter += 1 | ||
self.t += self.dt | ||
reward = self.get_reward() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the else
statement above redundant since num_sim_substeps_per_actuation == 1
should be equivalent? (Not critical, and if things are in "production" it may be dangerous to modify the code too much...)
@@ -200,7 +196,9 @@ def render(self, mode="human", clim=None, levels=None, cmap="RdBu", **kwargs): | |||
|
|||
|
|||
class RotaryCylinder(CylinderBase): | |||
MAX_CONTROL = 0.5 * np.pi | |||
MAX_CONTROL_LOW = -1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contradicts MAX_CONTROL
recent update from #195
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah is this based on anything in particular? If not, having it just match the pinball would probably make sense.
else: | ||
CL, CD = averaged_objective_values | ||
# return np.square(CD) + lambda_value * np.square(CL) | ||
return np.abs(CD) + lambda_value * np.abs(CL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could give details/references on this new loss at some point?
A brief summary (correct me if I am wrong) Hydrogym update
Cavity
CylinderBase
Pinball
Code review
|
"is set a reward callback function must be given, {self.rewardLogCallback}" | ||
self.reward_aggreation_rule = env_config.get("actuation_config", {}).get( | ||
"reward_aggreation_rule", None) | ||
assert self.reward_aggreation_rule in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reward aggregation rule is implemented and mandatory now, a default could possibly be set for backward compat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff , but I think we'll have to do some cleanup before it's really release-ready. Let me know if I can help with any of this. Are these basically the changes that made the baseline RL training happen?
High-level comments:
- Timesteppers don't use the actual time directly - that's only part of the ODE RHS, so in my opinion it makes more sense to continue having
t
be owned byflow
. If needed for callbacks or controllers or whatever it can always be accessed byflow.t
. - The reward aggregation stuff looks useful, but the source code is fairly confusing. I would say it should either be cleaned up and clearly documented or (preferably) moved into a
gym.Wrapper
unless this is typical as part of agym.Env
implementation. - I really like the idea of having the 1D PDEs as cheap environments, but these implementations are completely independent of everything else in Hydrogym, meaning they don't follow the
PDEBase
/TransientSolver
/FlowEnv
design. If the goal is a "production" release, I think it might be better to hold off on these and then later implement them as part of a full "torch" backend or something. Otherwise it seems like these could just be in a totally separate repo. - Generally my opinion is that the
FlowConfig
should be a physics model of the flow configuration and should not have anything related to reinforcement learning. That makes it so that the package could also be used for more traditional fluid dynamics and controls research. It should be possible to put all of that stuff either in some kind of callback or at the Env level.
@@ -272,12 +276,14 @@ def __init__(self, flow: PDEBase, dt: float = None): | |||
if dt is None: | |||
dt = flow.DEFAULT_DT | |||
self.dt = dt | |||
self.t = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the time is owned by the flow, not the solver. I guess you could do it the other way around, but since the "state" (velocity, pressure, etc) is owned by the flow class I figured it made more sense to have time there as well. Then it is advanced with flow.advance_time(dt)
here. On the other hand, the solver doesn't directly depend on time, only the timestep, so my feeling is that it doesn't naturally belong there.
Actually I think the only place it's needed is for the "callbacks" or "controller", but now that time is a property of the flow it doesn't need to be there at all (or the callback could be passed flow.t
instead of calculating it within the loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I agree, a solver object can be viewed as a mapping (implemented by a
solve
method) taking as input the current time$t$ (for non-autonomous systems), the current state$x_t$ and a time duration$T$ to return the solution$x_{t+T}$ $T$ times later. (this mapping is called a flow in maths). But the mapping in itself is independent of the time! Hence, initialising it withself.t = 0.0
may not be really needed. -
Practically, I don't see any use of
self.t
in this class, so it seems safe to remove it?
@@ -292,6 +298,43 @@ def solve( | |||
PDEBase: The state of the PDE at the end of the solve | |||
""" | |||
for iter, t in enumerate(np.arange(*t_span, self.dt)): | |||
iter = iter + start_iteration_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - what's the distinction supposed to be here?
@@ -317,20 +360,25 @@ def step(self, iter: int, control: Iterable[float] = None, **kwargs): | |||
|
|||
def reset(self): | |||
"""Reset variables for the timestepper""" | |||
pass | |||
self.t = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - time should be owned by the flow, and the reset method for the flow can also reset the time.
self.solver: TransientSolver = env_config.get("solver")( | ||
self.flow, **env_config.get("solver_config", {})) | ||
self.callbacks: Iterable[CallbackBase] = env_config.get("callbacks", []) | ||
self.rewardLogCallback: Iterable[CallbackBase] = env_config.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use snake case for consistent style
# if ((iter + 1) * self.dt * 10e6) % (self.DT * 10e6) == 0: | ||
if (iter + 1) % self.num_sim_substeps_per_actuation == 0: | ||
self.data = np.array(self.data) | ||
flow.reward_array = self.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another way to do this? The flow is a physics model and so RL-specific stuff should be abstracted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically transforming this class in a gym.Wrapper
q=None, | ||
qB=None, | ||
averaged_objective_values=None, | ||
return_objective_values=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments on evaluate_objective
changes for the cylinder
@@ -345,14 +355,20 @@ def velocity_probe(self, probes, q: fd.Function = None) -> list[float]: | |||
if q is None: | |||
q = self.q | |||
u = q.subfunctions[0] | |||
return np.stack(u.at(probes)).flatten("F") | |||
|
|||
# probe_values = np.stack([u.at(probe) for probe in probes]).flatten("F") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of commented out code here (and throughout the PR)
@@ -47,12 +49,13 @@ class PDEBase(metaclass=abc.ABCMeta): | |||
|
|||
def __init__(self, **config): | |||
self.mesh = self.load_mesh(name=config.get("mesh", self.DEFAULT_MESH)) | |||
self.reward_lambda = config.get("reward_lambda", 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used for anything. Also not related to the physics modeled by the PDEBase
.
MAX_CONTROL = 0.1 | ||
MAX_CONTROL_LOW = 0.0 | ||
MAX_CONTROL_UP = 0.03 | ||
CONTROL_SCALING = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this used? Why wouldn't you just scale the input action externally to the physics model of the flow?
@@ -70,13 +70,21 @@ seaborn = "^0.12.1" | |||
[tool.yapf] | |||
based_on_style = "yapf" | |||
|
|||
[tool.yapfignore] | |||
ignore_patterns = [ | |||
"hydrogym/experimental/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this get a pass on formatting?
I agree.
I think a
I agree!
I agree. From an outside point of view and by reading @jcallaham comments, the new code typically looks like a fork devoted to research experiments and publication. Indeed, it does not totally fit in the
I could help with this task by the end of Nov. In the end, the "feature" provided by this new code are the introduction of inter-decision time and reward aggregation, probes, a modification of the magnitudes of the control bounds + a sign fix in the dynamics. It would be interesting to have an explanation of why those changes are needed and how they impact the system. |
No description yet.
TODO: