I never said contradiction, I said its not enough to set conclusion.
I also didn't say you did, but I think you are slightly underestimating what theoretical calcs can do if read properly. But to be honest, I think we pretty much agree, there is no real point in arguing in the first place.
What spreadsheets, or any amalgamation of stats for that matter, allow you to do is to get insight into things you can't (or only with great difficulty) deduce from gameplay alone. Sure you can get a rough idea how likely it is to pen a PzIV with a Sherman, but without looking up the exact value from some external source you'll likely never be able to narrow it down precisely due to the huge influence of RNG.
Just to drive this point home for everyone:
Assume you try to deduce how good a Sherman penetrates frontally at medium range vs an Ostheer P4. You have 20 players discussing over their experience with the Sherman. We don't know what they understand by "mid range", and even if everyone understands roughly 20 meters, there will be variations from probably at least 15-25 meters being classified at mid range, variations because of players driving their tanks, factoring in missed shots, reduced accuracy due to moving, increased penetration due to engaging at an angle and getting a rear armor hit.
Let's put this into a test setup: 20 meters distance (120 penetration vs 180 armor), flat terrain, only frontal shots possible, only actual hits counting. We count 50 hits, assuming that the players in the discussion above would have their last roughly 50 shots in mind when thinking about the engagements. We do 20 tests to form each player's experience.
Basic statistics tells us that the real penetration chance is 66.7%, meaning 50*(120/180) = 33.33 shots should penetrate with a standard deviation of 3.33 shots (10% of the mean in our case). This means that almost one third of the players discussing will have had the Sherman penetrate less than 30 times or more than ~37 times. One player will have had the experience of the Sherman hitting less than 27 times or more than 40 times and probably be screeching about the trash Sherman or saying it were OP.
Even if all discussion was fully rational (good luck with that), players will regularly report a penetration chance anywhere between less than 60-75%.
Funny side note: The skill level doesn't matter. This one player could be Luvnest or VonIvan, not just your level 2000 rando. Those 50 shots probably represent the last 5 games, which represents 3 hours of games or 1-2 evenings of playing CoH2 with this faction and this unit. This in turn means that it might all the experience you get for 1-2 weeks playing CoH2 if you also play all other factions equally. This one player, high skill or not, will say that the Sherman were OP or trash. And if they are known as a good player, their opinion will have a larger impact although it actually should not. And we can prove that with numbers.
Back to topic: On top of that, all this assumes that this sample of players actually gets a "representative set of RNG". If I just hit refresh on the numbers, I regularly see that even all their "experience data" pooled together will both misjudge the real penetration value by a couple of %, as well as the standard deviation varying decently (often 7-13%, meaning players will either be more in agreement if it gets smaller or disagreement if it is larger). Is this huge? Not huge, but we've seen penetration and armor values being decreased by 10-20 regularly, which depending on shooter and target often lead to only a couple of percent penetration chance. So yes, we're still in the range of actual balance discussions being influenced by RNG.
If you want a really reliable discussion where the values only vary by a one or two percent, you'd need 100 players. That would be 5 pages on this forum if everyone just states their experience, no discussion involved yet. And don't forget: Everyone just tested the test setup, there still is no variation due to a real game.