I have made some further research on the topic and decided to expand the ruleset as shown below. (Note that the class percentages are just placeholders, explained in detail below.)
With this setup downloading or streaming videos (at full speeds) seem to have barely any effect on my ping ingame. The bufferbloat tests are also returning nicely. Would like to see others input on this.
Download:
Upload:
Reasoning for the changes:
Previously I've been using 512/128 bytes rule on download/upload respectively to give priority to one class on each side. This has worked reasonably well but not perfect.
I monitored a couple games (mostly shooters) with different playercounts to see where their packets end up on the size scale. A decent amount seems to be caught by 512/128. Think of team deathmatch style games up to 12 players or so. (I actually expanded it to 576 as without it there seemed to be a couple uncaught packets).
However as player counts rise (24+) or theres some AI on the map this very quickly bleeds over the previous rule on the download side. You can see this with the net_graph command in Source engine (Valve) games or just by looking at the Gargoyle classes. With 30 players this easily climbs into the 800-1000 bytes range (depends on the server setting of course).
A second rule of up to ~1280 seems to catch all of these even at the highest playercounts. This would make sense considering Source engine games have a setting (net_maxroutable) that, by default splits packets larger than ~1100-1260 (exact number depends on the game and player settings) into multiple parts.
Most downloads or online video streams I tried seem to rely on very large packets. That means well above 1280 and up to 1500 (which is the maximum possible size). Thus they get automatically sent to bulk without any additional rules needed.
Some people mentioned using 128 bytes on the download side as well, from my testing I see little point in that. Even with just a couple people around ingame the bytes received climb over that. If someone really wanted to be be that granular I feel like 256 would work better for gaming. But generally speaking you just catch all those with the 576 bytes rule with barely any difference in bandwidth funneled into that class.
Perhaps the 128 bytes is about prioritizing ACK packets on the download side? Those would be more like 40/52 in sizes though. Maybe SYNs at 60 bytes or so. Perhaps I'll do more testing on these specifically.
Upload does not seem to scale with playercounts (which would make sense, you are only talking to the server about yourself) so the 128 byte rule catches most of these games. However even in my limited testing I found some games which, regardless of playercount will operate in the up to 256 bytes range and would not be caught by the 128 bytes rule.
I'm not sure how i feel about the 52 bytes rule yet. My understanding is that 40 bytes is the length of a TCP ACK packet or 52 if it also contains a timestamp. A lot of games seem to operate above it (so 53-128 range). Its possible something could be done with this fact, but im not sure yet. Having the 128/256 bytes classes would probably be sufficient for most so I'll probably end up removing it when I'm done testing.
Note that the class percentages are just placeholders. The optimal setup would probably be reserving a low amount to the bulk class (10-20%) and roughly equally sharing the rest between the 2 or 3 prioritized classes with the assumption that they will never actually fill up their limits. The ones with larger packet allowance perhaps should get a bit less reserved.