You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run the above file (torch_gymnasium_pendulum_ppo.py) with 100,000 epochs. and I write scripts for testing. The top of the testing script is the same as the training path. The bottom part of the script for testing is as below.
env = gym.make('Pendulum-v1', render_mode='human')
observations, info = env.reset()
Simulation loop
num_steps = 100000 # Total number of steps for the simulation
for step in range(num_steps):
# Prepare the input for the model
inputs = {"states": torch.from_numpy( observations ).to(device=device)}
# Get actions from the model
actions = model( inputs)[0].cpu().detach().numpy()
observations, rewards, dones, infos, _ = env.step(actions)
print("Rewards:", rewards)
# # If all environments are done, reset them
if dones :
observations, info = env.reset()
Close the environment when done
env.close()
=============================================
The result shows that the reward never reaches 0 as anticipated. With 100_000 epochs, can you tell me what happens to this while the Agents never get to the goal?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all,
I just follow the example from document website (https://skrl.readthedocs.io/en/develop/_downloads/118e328d79ea6d72a3c818e23e77e7ee/torch_gymnasium_pendulum_ppo.py)
I run the above file (torch_gymnasium_pendulum_ppo.py) with 100,000 epochs. and I write scripts for testing. The top of the testing script is the same as the training path. The bottom part of the script for testing is as below.
agent = PPO(models=models,
memory=memory,
cfg=cfg,
observation_space=env.observation_space,
action_space=env.action_space,
device=device)
Load the checkpoint
agent.load("./runs/torch/Pendulum/25-01-24_17-11-38-827106_PPO/checkpoints/best_agent.pt")
model = models["policy"].to(device=device)
Create the environment
env = gym.make('Pendulum-v1', render_mode='human')
observations, info = env.reset()
Simulation loop
num_steps = 100000 # Total number of steps for the simulation
for step in range(num_steps):
# Prepare the input for the model
inputs = {"states": torch.from_numpy( observations ).to(device=device)}
# Get actions from the model
actions = model( inputs)[0].cpu().detach().numpy()
observations, rewards, dones, infos, _ = env.step(actions)
print("Rewards:", rewards)
Close the environment when done
env.close()
=============================================
The result shows that the reward never reaches 0 as anticipated. With 100_000 epochs, can you tell me what happens to this while the Agents never get to the goal?
Thank you,
Beta Was this translation helpful? Give feedback.
All reactions