If anyone wants to help hosting it here (with a lot of cool features possibly added later, such as diff), I would make the following plan. Feel free to use the knowledge if you want to host it somewhere else too. Maybe just at least tell us so we don't make double the work for the same goal.
The ultimate goal would be to have an application usable in command line on windows taking in input the game attributes folder with all the files and our dabatase login and as output inserting the data in our database. Sharing the code with us (as open source or not) would be optimal so we can keep maintaining it in case original dev would leave or give up, but is not a requirement.
It could be an option to run it on linux instead but then I'd need to manually transfer every time the data (around 100mb) from my comp to our server since I don't think the game can be installed there.
I have yet to define the database structure to hold that data as I would like to, but the parsing job can be started already. I also might not want to insert directly to database but instead going through a page filtering out the data and possibly making some pre-work such as archiving old data or publishing a diff.
Here's the required info :
The data is on "C:\Program Files (x86)\Steam\steamapps\common\Company of Heroes 2\CoH2\Archives\AttribArchive.sga"
The sga archive can be extracted using cc2sga.exe which can be found there :
http://forums.relicnews.com/showthread.php?270246-TOOL-CoH2-SGA-Extractor-v-1-1-02-04-13
Extracted files all have the rgd format which sucks. But fortunately cope is a beast and also made rgdConv.exe which can be found there :
http://forums.relicnews.com/showthread.php?270284-TOOL-CoH2-RGD-Tools-%28XML-TXT-JSON%29-RGD-Crawler-Hashing-Tools-v1-2-22-04-13
There you end up with a lot of folders and files, and those would be the input of the application to make.
Now to generate the output you need to browse those files to get the required info. The tools above allow to output as json, xml, txt. It doesn't matter to me which one you decide to work from.
The main difficulty of the job is to define which files are to read or not, and if/how to link them together.
I can give an example for german grenadier.
You need to know it is into ebps\races\german\soldiers\grenadiers\grenadiers_mp.json
There you can find everything about the gren. We might not want to display everything on the site because it wouldn't be very userfriendly or even useful.
You can relate to the current page while you can if that helps you :
http://coh-moderncombat.com/CoH2Stats/Axis/Soldiers/grenadiers.html
First array of the file contains the abilities grenadier_panzerfaust_mp, grenadier_rifle_grenade_ability_mp, etc
Basically you can find all the info on the file, it just needs to be defined what is where. It is probably a good idea to list here, as a post, what info we want to grab and to what they correspond on the file. This is not even coding work yet, just giving name to elements, and sorting them out.
Then, after it's done for the grenadier, it must obviously be done for all other units. And for every used reference.
To take the example again, we're going to list grenadier_panzerfaust_mp on the grenadier's page as ability. So we need to show its info and have a dedicated page for it. We need to know/figure out the info is on "abilities\german\modal_ability\accessory_weapons\grenadier_panzerfaust_mp.json"
There we can find info such as cost.
Then we also need to know about "slot_item\german\ballistic_weapon\infantry_at_weapon\grenadier_panzerfaust_mp.json" which is telling us the actual weapon is "weapon\axis\ballistic_weapon\infantry_anti_tank_weapon\panzerfaust_atw_mp.json" so we can actually see the damage it would do (80 as it is now), deflection dmg 0, penetration would be 1000 if I read this right, how it is fired, etc.
In summary 80% of the work is figuring out what and where is everything and what we actually want to display (current coh2stats site is missing a lot of interesting information, for example criticals). Being able to use ctrl+f either for filename and file content across multiple files at once is a must. Some of it could or not be scripted. Then there's 10% to actually write the parser, and 10% to write the conversion to whatever format I'll decide (mostly either sql or http request). And then it's on us to render that info properly on the site from our database.