Deploy AI Models Locally
Ollama¶
What Is Ollama?¶
Ollama is an open-source platform designed to make large language models (LLMs) more accessible and usable.
How to Deploy¶
- Visit Ollama and download.
- Choose a model you like
- Run
ollama run YourModel:ParameterSize
in the terminal or Windows CMD (Windows: Win+R $\rightarrow$ cmd) - Your model will start downloading. If the model has already been downloaded, you will go to the chat page (starts with
>>>
);
How to Choose An Appropriate Model¶
Choose Model¶
Visit Model Library
Here is the Aritificial Analysis's LLM Leaderboard
Recommendations:
Choose Parameter Size¶
The model size should be smaller than your memory size.
For instance, the
gemma2:9b
model is 5.4GB which is runnable on devices with 16GB memory.However, the
gemma2:27b
model is 16GB and it is not runnable on the same device (there must be some other programs that eat up your memory).
Drawbacks of Running in Terminal¶
Your chat history cannot be saved. If you want to store your history chat, deploy one of the following apps (the titles link to their GitHub Repos):
Enchanted¶
Open WebUI¶
- Download Docker
- Follow the Instructions
MoA¶
Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results.
The original MoA does not support Ollama. Here is my edited Ollama version of MoA forked from severian42's MoA-Ollama-Chat. I edited the GUI to make it look better.
Follow the Instructions here.