Skip to content

Deploy AI Models Locally

Ollama

What Is Ollama?

Ollama is an open-source platform designed to make large language models (LLMs) more accessible and usable.

How to Deploy

  1. Visit Ollama and download.
  2. Choose a model you like
  3. Run ollama run YourModel:ParameterSize in the terminal or Windows CMD (Windows: Win+R $\rightarrow$ cmd)
  4. Your model will start downloading. If the model has already been downloaded, you will go to the chat page (starts with >>>);

How to Choose An Appropriate Model

Choose Model

Visit Model Library

Here is the Aritificial Analysis's LLM Leaderboard

Recommendations:

Choose Parameter Size

The model size should be smaller than your memory size.

For instance, the gemma2:9b model is 5.4GB which is runnable on devices with 16GB memory.

However, the gemma2:27b model is 16GB and it is not runnable on the same device (there must be some other programs that eat up your memory).

Drawbacks of Running in Terminal

Your chat history cannot be saved. If you want to store your history chat, deploy one of the following apps (the titles link to their GitHub Repos):

Enchanted

Download in AppStore

Open WebUI

  1. Download Docker
  2. Follow the Instructions

MoA

Together MoA Repo

Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results.

MoA

The original MoA does not support Ollama. Here is my edited Ollama version of MoA forked from severian42's MoA-Ollama-Chat. I edited the GUI to make it look better.

Follow the Instructions here.

Comments