Deploy AI Models Locally

Ollama¶

What Is Ollama?¶

Ollama is an open-source platform designed to make large language models (LLMs) more accessible and usable.

How to Deploy¶

Visit Ollama and download.
Choose a model you like
Run ollama run YourModel:ParameterSize in the terminal or Windows CMD (Windows: Win+R \(\rightarrow\) cmd)
Your model will start downloading. If the model has already been downloaded, you will go to the chat page (starts with >>>);

How to Choose An Appropriate Model¶

Choose Model¶

Visit Model Library

Here is the Aritificial Analysis's LLM Leaderboard

Recommendations:

Choose Parameter Size¶

The model size should be smaller than your memory size.

For instance, the gemma2:9b model is 5.4GB which is runnable on devices with 16GB memory.

However, the gemma2:27b model is 16GB and it is not runnable on the same device (there must be some other programs that eat up your memory).

Drawbacks of Running in Terminal¶

Your chat history cannot be saved. If you want to store your history chat, deploy one of the following apps (the titles link to their GitHub Repos):

Enchanted¶

Download in AppStore

Open WebUI¶

Download Docker
Follow the Instructions

MoA¶

Together MoA Repo

Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results.

MoA

The original MoA does not support Ollama. Here is my edited Ollama version of MoA forked from severian42's MoA-Ollama-Chat. I edited the GUI to make it look better.

Follow the Instructions here.