Large language models can have billions, or even trillions, of parameters. But how big do they need to be to achieve acceptable performance? To test this, I experimented with several of Google’s Gemma 3 models, all small enough to run locally on a single GPU. Specifically I used the 1 […]