Tuesday, January 22, 2013

Data Collection

In any endeavor to gain a statistical understanding, the single most important aspect is the hunt for data. In many ways gathering data seems like a trivial step, so obvious that it can be left as an afterthought. After all, to many the data itself is uninteresting, it's what you learn from it that they truly care about.

The thing is, the data is the limiting factor for understanding. Collect the wrong data, ask the question improperly and your data no longer represents what you're trying to understand. When large polling companies set up surveys of elections, a great deal of thought and knowledge goes into creating questions that accurately gauge the opinions of their sample.

Fortunately, as a computer game, we don't much have to worry about asking the right question, the game itself generate a huge volume of data. This data isn't exactly easily accessible  stored on your computer in replay files or on Wargaming's servers. But it can be extracted.

The final problem is one of volume. For most statistical analysis, it is almost impossible to have too much data. When you're dealing with something as complicated as World of Tanks, with it 200+ different tanks in 12 different battle tiers, plus all of the possible combinations of modules and equipment and 200K+ active players (just on the NA server), the more data you have, the more likely you are to be able to say something meaningful.

Which is where you come in. The more people we have sending in replays, the better off we are for data. The more data we have, the confident we can be in our knowledge of the game. So the next post will show you how you can send us data so we can come up with pretty charts and graphs about our favorite game!

GL HF!

No comments:

Post a Comment