This repository is an attempt at creating a bot that plays Diablo 1. The game is only analysed from it's UI, not by direct memory access. The repository is in a super-early stage, with nothing useful in it for the moment.
This solution is heavily inspired by :
The project only works on Microsoft Windows operating system. As Diablo is only released for Windows, this is not a new restrictions.
- Obtain Diablo from GoG
- Make sure
python
,pip
, andvirtualenv
is installed. Latest versions suggested. - Checkout the repository, create and activate the virtual environment, install dependencies :
pip install -r requirements\requirements.txt
- Start Diablo in windowed mode (1440x1050 or higher suggested). Currently the cropping expects that the window title bar is visible, do not use fullscreen-windowed mode.
- Start the app:
python evil_snek\app.py
- Marvel in the beauty..
Automated testing is done in Linux containers. Please review azure-pipelines.yml
for an up-to-date requirement of the environment, and steps to setup and test a fresh checkout.
Character recognition is solved for numbers, please check the docs
folder for Jupyter Notebooks for examples.
As a primer on computer vision I found the following page helpful: Image Processing 101
Convoltions, kernel size, blurring: https://www.youtube.com/watch?v=C_zFhWdM4ic
ocr_by_template_matching.ipynb
Running the application will analyse the character tab. It will automatically be opened, taken a screenshot of, and closed, every 3 seconds. The screenshot is used for further analysis. Several properties are analysed, like experience and health. Please check the tests for up-to-date info.
identifying_health_potions.ipynb
To goal is to analyse the current belt, what kind of potions are available on which hotkey.
This is the next target. With some OpenCV filters and current experience it shoudl be doable.
Using Win32 API the python app successfully sends mouse and keyboard events to the game, mocking user input. This is effectively in use to open and close the character tab with the letter 'c'.
Progress here is (yet) missing. At this early stage in the project the focus is solely on understanding the environment. Q-learning, as a form of reinformcement learning, is planned to be applied to this problem.
Further resources to look into:
- Mady - Deep Reinforcement learning
- Reinforcement Learning: An Introduction
- DeepMind Course on reinforcement learning
- MSS -> Grab the screenshot of the game, docs here help a lot
- OpenCV 4.0 -> Process the image stream
- pywin32, WIN32 API PostMessage --> Send keyboard and mouse events to the window, simulating user input
- Jupyter Notebook describing OCR: OpenCV Template matching ( with some other image preprocessing) for Optical Character recognition (OCR).
- Docker --> for automating tests. Linux containers are used.
- PyRight as a VSCode plugin for dev time type-checking. Extra resource to check:
- The Brood War API
Automating the UI events to Diablo, a DirectX game, was challenging, altough looking back it is pretty simple. Instead of spending hours googling for partially working solutions I suggest the analytical approach. There is a neat tool called Spy++, distributed as a tool of Visual Studio, to watch the window of Diablo and log the messages that it receives. The proper events, both for keyboard and mouse actions, will be nicely logged. MSDN documentation will help to understand what are all those parameters for the messages. The resulting code might not be super nice, but it will work.
The solution is based on divide and conquer. The solution will rotate across the module. Each module is a flow / iteration of measure, plan, act. We need to solve the following modules:
Based on the current contents of the inventory, equip the best gear. "Best" depends heavily on the class, and a specific strategy within the class, for example a warrior with one handed weapons.
Identify enemies. Kill enemies. Collect the loot and gold. Understand if inventory is full. Most importantly: don't die. As an advanced feature: understand enemy resistances. Later on in the game certain enemies can only be killed with certain damage types. This might require advanced inventory management, keeping in stock multiple weapons.
Return to town. Recover health. Sell extra items. Repair items. Spend money, buy additional items.
Managing the character, allocating character points. The ideal distribution is heavily dependent on the exact strategy of the character. Needs to work well together with inventory management, as items have certain requirements.
Tesseract was used as an Optical Character Recognition (OCR) engine, but was deprecated. It turned out to be very unreliable for my application, it was designed for other purposes really:
- Tesseract can identify the location of text in an image. I do not need this. The game UI is pretty static, I have a very good approximation on where the text will be.
- Off-the-shelf trained data (official or community) is not trained for Exocet, the font of Diablo. The game mostly only uses this one font. The usual models are trained on a lot of fonts, but not on Exocet
- Results were very brittle for changes in source data. Cutting the image 1-2 pixels differently ( not cutting into the important parts, doing preprocessing to eliminate background noise) created very chaotic results. It was really hard to progress in the development with confidence.