Welcome to some past posts. I don't have many excuses. I've been coding but not writing.

day 15

I'm writing this on day 18 to be honest so I might leave out some critical details but here's the project I started.

A few months ago I started taking this MOOC from University of Michigan called Statistics with Python (or something like that, understanding stats with py...) and it started off easy enough but then the difficulty ramped pretty quick and then it turned into one of those courses where they lecture at you and then expect you to understand how to do multivariate analysis and understand blood pressure immediately too.

I don't mind classes being hard but I absolutely, more than anything HATE when I'm taking an online class, and I'm paying for it and I'm told to go search for other resources. Why am I paying for this course if they're going to just tell me the obvious and expect me to go do my learning elsewhere? If I'm paying for a proprietary resource I better be able to put my full trust into the resource to be my everything. I am a self taught programmer, I understand when and how to search for things on Google.

Don't set up a class if you expect the learning to happen elsewhere.

So I quit course. I'm not going to pay money for someone to tell me to go look for alternate learning resources elsewhere.

What I did learn from this course though is that you really really really need to be able to connect the learning back home in your head which means I had to start my own statistics research project. And I had one all thought up before I even knew it.

I play Fortnite. And I love it. However, the game collects only certain statistics (games played, wins, kills) and I noticed at the end of every single game you can pull up a menu of game stats about the game you just had.

I once had an idea to do this with another game I played, Star Wars Battlefront. There was so much data in the end game menu it was incredible. I felt like we could learn about how the computer plays and why on certain levels it's so hard to win games.

The problem to both of these games is how to export that data in a format that it can be collected and then studied. I started putting together my own dataset with Star Wars Battlefront a few years back by hand. I printed off a spreadsheet with certain aspect of the overall data I thought was important and at the end of every game, wrote that data down.

This practice was very tedious. However, my friend and I started calling our gaming sessions - data collection. Back then, I had no intention of using this data to understand data, I just wanted to learn more about the game.

The project never turned out and I never actually learned too much (that I remember anyway) through the data collection process.

The process itself had me searching about how I could somehow export that data. I was researching countless ways about how to find out about the inner workings of the Xbox (first xbox) game or how I could maybe install some sort of device that would take data from my screen and save it (this is long before I knew about video game streaming, and really before it was "popular").

The project died because of many reasons. One being it wasn't as much fun playing video games if I was doing work at the same time. Also because I became to obsessed with other aspects of data collection that had no basis in just doing the dirty work of collecting that data.

Years later comes round two. And I had a plan this time. My Nintendo Switch can take screen shots. So I would have a real time way to collect that data. After programming for a few years as well I was a little more confident I could take this screenshot data and read out things from it, hopefully then being able to design a class that would read that data into a dataset.

So I told my friends to remind me to take screenshots at the end of every game, no matter what so much so that it has no become a habit.

Here was my plan. Before deciding to take screenshots I had researched if there was a way to take a picture with text in it and pull that text. They use this kind of tech for the job application process and mining data from PDFs. I found a library of code that would do just this call OCR Tesseract which even had a python wrapper version called pytesseract.

For the last few months I didn't even think about "collecting" the data like this, I didn't even test this theory. I assumed this method was going to work after watching a few tutorials with this library.

This Wednesday I decided to start collecting this data (motivated by finding a game data analyst job application but nothing to show for my data analyst wannabeness).

Here's how that plan went:

In [4]:
import pytesseract
from PIL import Image # a library to interact with images. maybe you could have guessed that tho..

Here's the image.

stats

I cut this selection from the entire screen shot (you'll see why in a second) to cut down the amount of information the tesseract had to process. I thought this would work better.

Also, +5,000 damage to players. Let's go. I'm a lazer beam.

This was in squads as well so I didn't actually make it all the way to the end of the game.

In [5]:
img = Image.open('./Selection_773.png')
# this is so crazy simple to do
text = pytesseract.image_to_string(img, lang='eng') # language to english, pretty easy to understand syntax

text
Out[5]:
'BACLT Moo ONL\n\nTE ae Ue OM a\none Pec tt a)\nProv 0 Materials Used 0}\nPec oy? PT Eu 532\nPn PLE ns AEP Le\n\nHead Shots 98 DamageTo Structures 5,035'

Soooooooooo....yeah. That's basically impossible to get data from consistently. The first time I did it as even worse so I guess sometimes it does pretty well.

This is considering that I cut out the rest of the picture too. There are methods of preprocessing photos so that this library could read them easier but here's the problem I'm always facing: Do I go off into the jungle like Charlie Hunnam in Lost City of Z with faith that I might actually find the City of Z (the city being the process to preprocess pictures to perfectly provide me the data I need) or do I enter it by hand into a spreadsheet?

Sometimes you have to do stuff the tedious way.

This is what I've been working on the past few days.

Also applying to grad school.

More on this in the next few posts.

1/3 (like twitter threads)