Large language models can have billions, or even trillions, of parameters. But how big do they need to be to achieve acceptable performance? To test this, I experimented with several of Google’s Gemma 3 models, all small enough to run locally on a single GPU. Specifically I used the 1 billion parameter model, the 4 billion parameter model, and the 12 billion parameter model. Obviously these are significantly smaller than their proprietary Gemini models with hundreds of billions of parameters, but let’s see how well they perform on some question answering tasks.
The code I used to do this comparison is available here.
Evaluating Plain LLMs
To start with I gave it a very simple instruction prompt:
You are a helpful chatbot designed to provide factual information. Give two to three sentences of information
Let’s first see if it can answer some questions about historical figures. For this example, I chose Immanuel Kant and asked, “Who is Immanuel Kant?” These are the results.
Agent gemma3:1b
Immanuel Kant (1724-1804) was a German philosopher who is considered one of the most influential thinkers in Western history. He is best known for his work on ethics, particularly the concept of the categorical imperative – a fundamental principle of moral action. His philosophy heavily influenced subsequent philosophical and scientific thought.
Took 2.348175048828125 seconds
Agent gemma3:4b
Immanuel Kant (1724-1804) was a highly influential German philosopher considered one of the central figures of the Enlightenment. He is best known for his “critique of pure reason,” which explored the limits of human understanding and argued for the importance of both reason and experience in forming knowledge. His work profoundly impacted ethics, metaphysics, and epistemology, shaping Western philosophical thought for centuries.
Took 24.129546403884888 seconds
Agent gemma3:12b
Immanuel Kant was a highly influential German philosopher of the Enlightenment era, born in 1724. He is best known for his work on epistemology (the study of knowledge) and ethics, particularly his concept of the "categorical imperative." Kant's ideas profoundly shaped Western philosophy and continue to be studied and debated today.
Took 136.66211867332458 seconds
Nothing too surprising here. All those are reasonably accurate. There isn’t anything particularly insightful, though I did instruct it only to provide two or three sentences. The 4 billion model probably should have capitalized Critique of Pure Reason, but that is a pretty minor thing. The big takeaway here is the time difference. The one billion model returned in under 3 seconds, the 4 billion took 10 times as long, and the 12 billion model took a couple of minutes to come out with that.
At this point smaller models look better. It seems we don’t need that many parameters to get accurate answers to questions like this.
But let’s try something more specific. My next question was “When did Immanuel Kant live, giving specific dates?” According to the Wikipedia he was born April 22nd, 1724, and died February 12th, 1804. What does the 1 billion model say?
Agent gemma3:1b
Immanuel Kant lived from December 24, 1724, to October 18, 1804. He was born in Kutzerstadt, Prussia, and died in Königsberg, Prussia (now Kaliningrad, Russia).
Took 10.186898708343506 seconds
Well, I guess its earlier good performance was too good to be true. It got the years right, but it just made up the months and dates. Also, where the hell is Kutzerstadt, Prussia? Kant famously never left his hometown of Königsberg. I didn’t even ask where he was born, so I don’t know why it felt the need to hallucinate that.
Let’s see if the 4 billion model can do better.
Agent gemma3:4b
Immanuel Kant lived from November 22, 1724, to February 12, 1804. He was a hugely influential German philosopher, primarily known for his work in metaphysics and epistemology. His major philosophical works, including Critique of Pure Reason, were published during his lifetime and continue to be studied today.
Took 64.40218877792358 seconds
At least it got the right date of death. And the day of the month for his birthday is right, it just has the wrong birth month. And hey, it capitalized Critique of Pure Reason this time! And no hallucinations about random Prussian towns. So I’ll give it half credit.
Can the 12 billion model finally get this right?
Agent gemma3:12b
Immanuel Kant was a hugely influential German philosopher who lived from April 22, 1724, to February 12, 1804. He was born in Königsberg, Prussia, and spent his entire life there, serving as a professor at the University of Königsberg. His work profoundly shaped Western philosophy and continues to be studied today
Took 161.72146368026733 seconds
Finally, we have not only the right dates, but it correctly pointed out that he spent his entire life in Königsberg. It just took over 2 and a half minutes.
This is nice and all, but Kant was a pretty well-known historical figure. I mean, if you go to your local bar and start asking random people about him, you might get a lot of blank stares, unless you are in a bar frequented by the local university’s philosophy department. But let’s try someone who is a little more esoteric. For my next example, I went with the inventor of the Clojure programming language. Oh magic LLMs, “Who is Rich Hickey?” First up, the lightweight 1 billion model:
Agent gemma3:1b
Rich Hickey is a prominent figure in the world of audio technology, most notably known for his work with Apple’s AirPods and the development of the Magic Mouse. He’s a former Apple employee and a highly sought-after expert in areas like audio processing, signal processing, and advanced audio hardware. He’s often described as a “visionary” due to his innovative and often secretive contributions to these technologies.
Took 18.96462082862854 seconds
I ran this multiple times, I got answers that varied from a reclusive Google exec to the real inventor of Bitcoin, who Nakamoto ripped off. It seems to understand he’s in the tech industry. But that’s about it. Hopefully the 4 billion model can do better.
Agent gemma3:4b
Rich Hickey is a prominent figure in the field of behavioral economics and game theory. He is best known as the creator of the Elicit framework, a methodology for eliciting and understanding complex preferences. Hickey is also a professor at Georgetown University and the founder of the company, Elicit, which develops AI-powered tools for research.
Took 64.33937740325928 seconds
I think that’s a no. Let’s try the 12 billion model.
Agent gemma3:12b
Rich Hickey is a prominent software engineer and programmer known for his work on Clojure, a functional programming language that runs on the Java Virtual Machine. He is also the creator of the "Simple Made Efferent" (SMEP) programming philosophy, which emphasizes simplicity and clear communication in software design. Hickey is highly regarded in the programming community for his deep understanding of software architecture and his contributions to functional programming.
Took 164.98658800125122 seconds
It seems to have bungled his “Simple Made Easy” speech from Strange Loop. But the model does have a general idea who he is. Assuming you don’t mind waiting two and half minutes, of course. It might be easier to just check the Wikipedia, though.
But wait, why don’t we just check the Wikipedia? Is there a way to have AI agents do that instead of just going off of what they memorized in the training data?
RAGs to Riches
RAG, or Retrieval-Augmented Generation, is a technique by which agents search for relevant documents and then include that text as context with the user’s query. Vector databases are useful for this, as you can store embeddings of the documents and then do a similarity search against the query to find the documents most likely to help. For my testing purposes, I didn’t store the entire Wikipedia, but only a small selection of articles. I then updated the instructions:
You are a helpful chatbot designed to provide factual information.
Answer the QUERY from the provided CONTEXT.
The provided CONTEXT is the only source of information.
If the answer to the QUERY is not in the CONTEXT, just say ‘I don’t know’.
Give two to three sentences of information
Then after generating a context from the most relevant article, I updated the query to include the context as well as user’s input.
Let’s see how our little 1 billion parameter model works with this system. First, “Who is Immanuel Kant?”
Agent gemma3:1b
Immanuel Kant was a German philosopher and one of the central Enlightenment thinkers. Born in Königsberg, he is best known for his work in epistemology, metaphysics, ethics, and aesthetics, and his doctrine of transcendental idealism. He wrote the Critique of Pure Reason (1781/1787), his most famous work. He believed reason is the source of morality and that aesthetics arises from a faculty of disinterested judgment. Kant hoped for perpetual peace through international cooperation and altered his views on scientific racism in the later years of his life.
Took 14.352574586868286 seconds
It took a little longer to generate this response, though that includes the time it took to look up the most relevant Wikipedia page and then pull the article’s contents. It did exceed the two to three sentence instruction. That is actually something I noticed when the context got longer, this model in particular tended to forget its instructions. In fact I had to truncate the text or it would forget the instructions entirely and just summarize the article.
The other two models did just fine.
Agent gemma3:4b
Immanuel Kant (born Emanuel Kant; 22 April 1724 – 12 February 1804) was a German philosopher and one of the central Enlightenment thinkers. Born in Königsberg, Kant's comprehensive and systematic works in epistemology, metaphysics, ethics, and aesthetics have made him one of the most influential and highly discussed figures in modern Western philosophy.
Took 66.67007899284363 seconds
Agent gemma3:12b
Immanuel Kant was a German philosopher and one of the central Enlightenment thinkers. He is known for his comprehensive and systematic works in epistemology, metaphysics, ethics, and aesthetics, which have made him one of the most influential figures in modern Western philosophy. Kant's doctrine of transcendental idealism argued that space and time are mere "forms of intuition" that structure all experience.
Took 214.0527482032776 seconds
Ok, but it even the small model no trouble answering that question without the Wikipedia’s help. What about the more specific date question? “When did Immanuel Kant live, giving specific dates?”
Agent gemma3:1b
Immanuel Kant lived from October 22, 1724, to October 12, 1804. He was born in Königsberg, Prussia (now Kaliningrad, Russia), and died in Königsberg.
Took 18.28734564781189 seconds
Remember, the correct dates were April 22nd, 1724, and February 12, 1804. It gets the year and day of the month right, and this time correctly states that he both lived and died in Königsberg. But it’s still having trouble with the months.
Did the 4 billion and 12 billion models do any better?
Agent gemma3:4b
Immanuel Kant lived from April 22, 1724, to February 12, 1804. He was a hugely influential German philosopher who developed transcendental idealism. His major works, including Critique of Pure Reason, were published during his lifetime.
Took 63.93188738822937 seconds
Agent gemma3:12b
Immanuel Kant lived from April 22, 1724, to February 12, 1794. He was born in Königsberg, East Prussia, and died in the same city. Kant was a hugely influential German philosopher.
Took 161.58075857162476 seconds
Yep, they performed quite well.
And how did they do with Rich Hickey?
Agent gemma3:1b Rich Hickey is a computer programmer and creator of the Clojure programming language, known for developing tools like ClojureScript and Extensible Data Notation (EDN). He has also worked on scheduling systems, database design, and audio analysis. Before Clojure, he developed dotLisp, a similar language based on the .NETFramework, and has spent considerable time on projects related to machine listening and fingerprinting. Took 19.167198181152344 seconds Agent gemma3:4b Rich Hickey is a computer programmer and speaker best known as the creator of the Clojure programming language. He also created ClojureScript and the Extensible Data Notation (EDN) data format. Before Clojure, he developed dotLisp, and he has worked on various projects including scheduling systems and database design. Took 65.9563455581665 seconds Agent gemma3:12b Rich Hickey is a computer programmer and speaker, best known as the creator of the Clojure programming language. He also created or designed ClojureScript and the Extensible Data Notation (EDN) data format. Before Clojure, he developed dotLisp, a similar project based on the .NET Framework. Took 189.12884402275085 seconds
None of them seemed to have any problem at all.
Conclusion
So it seems that while smaller models do have limitations, techniques such as RAG can help. Information that would normally require a model several times as large for it to remember can be provided directly to it. This improves reliably, reduces hallucinations, and makes them useful for even more esoteric subjects.
But that brings up an interesting possibility. Within those billions of parameters are stored details such as when Immanuel Kant was born, what nation Königsberg was part of in the 18th century, the names of people who created different programming languages, etc. None of those details are needed in the vast majority of its applications. If we use techniques such as RAG to look them up on demand, they are arguably never needed. And knowing this random information makes them prone to hallucinations where the thing they are trying to look up is similar to something better modeled in its training.
There is an old quote by Albert Einstein, “Never memorize something that you can look up.” But these models are forced to “memorize” these details as part of their training, which requires them to reconstruct texts in their training set. A common training task is giving the model a sentence or two with some missing words and having it fill them in. It’s giving the models a closed-book exam on trivial details found in the Wikipedia.
But what if instead we gave them an open-book exam? I suspect this could be accomplished by having more context available to it in the training tasks. Instead of giving it just one or two sentences, give them entire articles with one sentence that needs to be reconstructed. Then its training would focus not on memorizing details but on being able to find the correct answer in the provided text. This could, in theory, allow for smaller and cheaper models to do more complex tasks where they are provided with more reliable sources of information.