CS 201 - Computer Science I - Spring 2004
Project 4
Loyola College >
Department of Computer Science >
CS 201 >
Projects >
Project 4
Due
Monday, April 26th Tuesday, April 27th at 11:59pm.
Projects will be accepted one day late without penalty. Projects two
days late will be assessed a 40% penalty. Each additional day late brings
an additional 20% penalty.
Projects will not be accepted more than four days late.
Objectives
- to use arrays, 1-D and 2-D
Introduction
Artificial intelligence is a sub-field of computer science that deals with algorithms which enable programs and physical machines to operate independently of humans. In this project we will utilize one of these algorithms to teach a robot how to navigate a maze. This particular algorithm takes a similar approach to teaching a robot that a pet owner might take when training a dog. The pet owner rewards a dog for behavior the owner wants to encourage and punishes the dog for behavior she wants to discourage. Usually the reward is something like food or attention that the dog likes to receive. Robots have no innate desire for anything; however, we can program an algorithm to "reward" our robot with numbers.
The way our robot will learn is by exploring the maze and assigning numeric values to each of the positions in the maze. Positions with higher values will be interpreted as more desirable. After the robot has navigated the maze a few times, it will "learn" how to get from the starting position to the end position based on the relative values of these numbers.
The algorithm we will use in this program comes from the branch of artificial intelligence known as reinforcement learning. Reinforcement learning never tells a robot explicitly how to accomplish the task. Instead, it creates a model of the world and rewards the program for completing the task. The power of this particular approach is that the program is not hampered by a human's preconceived notions about how that task should be completed.
Gridworld Problem
The problem we will study is known as the Gridworld problem. Gridworlds are 2-dimensional rectangular grids where each individual cell corresponds to a position in the world. An example appears below. In the Figure 1, "S" refers to the starting position and "G" refers to the goal position. In the Gridworld, a robot has four possible movements: north, south, east, and west. The robot will not be allowed to move off the grid or onto a position that is blocked. In order to reward the robot for reaching the goal, we will give it a reward of +1 when it moves onto the finish position, or goal state. All other moves will receive a reward of 0. The robot will use the reward to develop a mental model that tells it which cells in the Gridworld are likely to lead to a reward.

Fig. 1
To begin training the robot, it will have initial estimates to the value of each cell in the Gridworld. All the cells except the goal position will have a value of 0. The goal position will have a value of 1. As the robot explores the Gridworld, it will modify the value of the positions to determine the relative importance of the cells according to the following formula:
where V(curPos) = value of the robot's current position
V(nextPos) = value of the next position the robot will move to
reward = reward the robot received for being in nextPos.
α = 0.1
γ = 0.4
The values α and γ are constants necessary for learning. These values are set by empirical data to minimize the number of trials the robot needs in order to learn the path to the goal position.
h
The robot will move around the Gridworld in search of the reward. When choosing the next move, the robot will assess the values of the four adjacent cells. Ninety percent of the time, the robot moves to the position that is most highly valued as in Figure 2. If there are multiple positions of the same value, it should choose randomly among the best positions as in Figure 3. Ten percent of the time, the robot will move randomly. In this case the robot disregards the values of the states and chooses any adjacent position as shown in Figure 4.
This random behavoir will introduce some variability inot the robot's movementsm thereby allowing it to explore alternate paths to the goal.
Example Applet
Click reload to rerun
Above you see a working demonstration of the program you will create. The moving text of "r#" represents a robot move. This applet shows the last ten moves of the robot. Each move is numbered so you can see how long the robot has been exploring the Gridworld. It usually takes slightly over 1000 moves to find the goal position the first time. With each successive trial, it becomes much faster, since the robot has assigned positive values to the cells nearest the goal. The colors indicate the value of the cells. The darker blue the cell is, the higher it is valued. Any position with a value of 0 is colored white.
Assignment
Write two classes Robot<Your Name> and RobotWorld<Your Name>. Your RobotWorld class will be an applet and the only method will be paint. This method will create the world by specifying its dimensions and creating Location objects for the start position labeled "S" above, goal position labeled "G" above, and blocked positions shown in black. The method paintwill then instantiate a Robot object and a DrawMaze object. The purpose of the Robot object is to learn values for the positions in the Gridworld. The DrawMaze object will color code the values of the Gridworld and draw them on the screen.
Part 1: Creating the Robot
For this class you will use the following data fields:
- world - 2D array to store the values of the positions in the grid world
- blocked positions - an array that stores the positions that are blocked
- start - the starting position
- goal - the goal (or finish) position
- current - the current position of the robot
- epsilon - the chance of taking a random move which is set to 0.1
- alpha(α) - a learning constant set to 0.1
- gamma(γ) - another learning constant set to 0.4
- goal reward - the reward the robot earns for moving into the goal position (i.e when next position is the goal position) which is set to 1
You may not add any more data fields to the Robot class; however, you may add constants as needed.
You will also need to write the following methods:
Part 2: Training the robot
Once you have initialized all the values in the paint method, you are ready to train the robot. Training will involve several trials. For each trial, you will move the robot to the starting location and then allow it to explore the maze. You can use the method drawInclRobot which is part of the DrawMaze class to show the current location of your robot as it wanders around the Gridworld. If you want to see your robot more easily, you may want to add a delay between each step the robot makes. Once the robot has found the goal position, erase the current Gridworld so there aren't any remnants of the past trial on the screen and redraw it using the draw method in the DrawMaze class.
You should create two different robots that learn different Gridworlds. One of the robots can learn the Gridworld shown in the example on this web page and defined in GridWorldDemo. The other should be larger, have different locations for the start and goal positions, as well as different blocked squares.
Supplied Code
DrawMaze.java
RobotMove.java
Location.java
Extra Credit
One of the reasons reinforcement learning is to powerful is that the robot can adapt if the environment is changed. For extra credit, modify the Gridworld after the robot has learned how to reach the goal. When modifying the world, you may change the goal position and/or the blocked positions and see how long it takes to find the new location. Run several experiments and determine on average how many steps it takes for the robot to find the goal. Turn in your extra credit as a separate class. Also turn in a graph that shows the average number of steps for the first 20 trials on the modified Gridworld.
Click here to see an example.
Grading
- 70% Execution. Partial credit will be granted depending on how
much progress you made towards the correct result.
- 20% Design. The design score is based on how easy it is to
follow the logic of your code, how well you avoided repetitive code,
and how easy it would be to change your code if certain specifications
changed.
- 10% Style. Style includes comments, indentation, and choice of
variable and method names.
- Substantial progress (complete parts 1 and 2) must be made towards correct execution to
earn the points for design and style.
Submissions
Submit the source code (.java files) for your RobotWorld< Your Name > applet
and your Robot< Your Name > class.