Replies: 4 comments 5 replies
-
This seems closely related to the process interaction formalisms for discrete event simulation. See e.g. Jacobs & Verbraeck (2004). |
Beta Was this translation helpful? Give feedback.
-
@projectmesa/maintainers, I would love to hear everybody thoughts! @harshmahesheka, you spend quite a bit of time on Reinforcement Learning with Mesa. Would you view this as an useful construct to build learning models on? |
Beta Was this translation helpful? Give feedback.
-
Tasks turned out to not really be useful on their own. Tasks become really powerful when an Agent can initiate (schedule) them based on it's current states (and maybe based on request from other agents). Therefore, you need a few components:
Ideally you want this modular and extendable, so that you can use RL to tune loss functions and/or decision logic. Scope creep is hitting hard. Going to spend some more brain cycles on this on how to make it minimal. If anyone knows something that does this already, highly recommended (again @harshmahesheka). |
Beta Was this translation helpful? Give feedback.
-
While this might be useful to Mesa, I wonder if this would be best external to Mesa to start as a add-on? And it conceptually structures the tasks around the current time construct, where a task takes X unites of time to complete with all sorts of different way to structure. |
Beta Was this translation helpful? Give feedback.
-
Introduction
ABM started quite simple, with Agents doing (or not doing) things in a fixed timestep. Then, we evolved it to allow agents to do things at any time, first with the DiscreteEventScheduler and then even better with full support for discrete event scheduling. However, Mesa doesn't really have a concept of a time duration to do something. You can emulate it by scheduling the next task for an agent some duration in advance, but this has several limitation:
To solve all of this, I would like to propose the
Task
.The Task
A Task is an object that represents a specific activity an Agent is performing over a period of time, with a defined duration, priority, and completion state. It encapsulates all the information needed to track what an Agent is doing, when they started, when they'll finish, what happens upon completion, and how it can be interrupted.
In my vision, it has a few handy properties:
Examples
Let's talk through a few examples.
Wolf-Sheep
Imagine you are a sheep. Most of the day you're searching for vegetation and eating it. This could be modelled as a relatively long tasks, you might do this for hours a day. However, you're also always spending some attention on spotting wolves. Once you spot one, you break of your food-search tasks, and highly priority you're fleeing tasks.
Ant Colony
Consider ants building a new nest chamber. Individual ants perform a "Dig Soil" task with progress determined by soil hardness. When they've gathered enough soil, they start a "Transport Material" task with duration based on distance and load. However, if they encounter a colony member with a stronger urgent pheromone signal, they can interrupt their current task to assist - perhaps there's a cave-in that requires immediate attention. The reward function for construction tasks is non-linear: partial nest chambers provide little benefit, but completing rooms gives large colony-wide bonuses. This creates emergent cooperation where ants dynamically adjust their tasks based on colony needs and local information.
Medieval Market
Picture merchants in a medieval marketplace. Each has a primary "Sell Goods" task that's continuously active but with variable reward rates based on time of day and location. They can temporarily interrupt selling to perform a "Haggle" task when high-value customers appear - this has a uncertain duration and could result in either high rewards or wasted time. Meanwhile, they're also managing a "Watch for Thieves" task that runs parallel at low efficiency. If they spot suspicious behavior, they can initiate an immediate "Protect Goods" task that prevents theft but halts all trading. The priority of these tasks shifts based on crowd density, reported thefts, and remaining inventory - creating dynamic market behaviors.
Desire-based modelling
One of the things that this might allow is what I call desire-based modelling. Agents have internal desires they want to fulfil, and they have states with how much these desires are fulfilled. These desires can decline based on time, environment or tasks involved. For example, agents might have desire to stay warm, be fed and have sex. The larger the gap between an agent state and their desired state, the higher a task can be prioritized.
(Future) ideas to further develop
Priorities and scheduling
It seems obvious, that once you have tasks, you would like a system to agents to automatically order and schedule them. Some mechanisms to do that would be useful, either based on priorities, urgency, desires and/or expected rewards.
Time sensitives rewards.
Generally there is a benefit of doing something, which diminishes over time. Sometimes this is a hard cut-off, where the full reward is available upon some time and then it springs to 0. Other times it might just diminish over time.
Nestedness
It seems logical that tasks could be nested. There is the big tasks "Feed yourself" which might consists about searching for food and eating it. Searching for food again consists about looking for potential directions and moving to a potential direction.
Permission management
Who can interrupt which tasks? Who can claim which resources? Some mechanisms to let agents fight over that might be nice.
Multiple simultaneous tasks
The most simple implementation here would say an Agent has a capacity to handle X tasks, and if a tasks takes up some part of X the remainder. However, that might also need to include a function that scales the tasks efficiency with the portion (this might not always be linear).
Task States and Transitions
Tasks would benefit from more granular states beyond just active/interrupted/completed, such as "preparing", "waiting", or "paused". This needs clear rules about valid transitions between states and should track transition history for analysis.
Failing
Tasks need robust failure handling - what happens when requirements become false during execution, how many retry attempts are allowed, and how to handle timeouts for stuck tasks. This connects to both state management and multi-agent coordination.
Multi-Agent Coordination
Some tasks require multiple agents working together, raising questions about delegation, sharing, communication and conflict resolution. This could include task trading, bidding systems, and coordinated resource usage.
Learning and Adaptation
Tasks could improve in efficiency with repetition, agents could learn optimal task sequences, and historical performance could affect future decisions. This allows for dynamic adaptation to changing conditions.
Let's discuss
Very curious what everybody thinks about this. Give any thoughts and feedback you like, but I also have a few specific points:
Beta Was this translation helpful? Give feedback.
All reactions