RL 的一個重要步驟是取得環境狀態,在 Gym 裡面,由 step function 提供環境狀態。step 會回傳 4 個變數,分別是
observation (環境狀態)
reward (上一次 action 獲得的 reward )
done (判斷是否達到終止條件的變數)
info ( debug 用的資訊)
從呼叫 reset,整個環境就會重頭開始,此外 reset 會回傳一個初始的環境狀態。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import gym env = gym.make('CartPole-v0') for i_episode inrange(1): #how many episodes you want to run observation = env.reset() #reset() returns initial observation for t inrange(100): env.render() print(observation) action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break
執行上面這一段程式碼,你就會看到每一步收到的環境狀態不斷地被印在 terminal。
Space
除了 observation 之外,RL 中另一個重點就是要定義可以做的 action,這兩者都由 space 來定義。
大家可以使用下面的程式碼來查看 action space 跟 observation space。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
import gym env = gym.make('CartPole-v0') ## Check dimension of spaces ## print(env.action_space) #> Discrete(2) print(env.observation_space) #> Box(4,) ## Check range of spaces ## """ print(env.action_space.high)- You'll get error if you run this, because 'Discrete' object has no attribute 'high' """ print(env.observation_space.high) print(env.observation_space.low)
import gym from gym import spaces env = gym.make('CartPole-v0') env.action_space = spaces.Discrete(1) # Set it to only 1 elements {0} for i_episode inrange(5): #how many episodes you want to run observation = env.reset() #reset() returns initial observation
for t inrange(200): env.render() print(observation) action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break
留言討論