My Local AI Setup
Written by Jeff on December 22, 2024Current Gaming Computer Setup Turned AI Server
- ๐ป Processor: Intelยฎ Coreโข i9-13900KS
- ๐ฅ๏ธ Mainboard: ASUS ROG Strix Z790-E Gaming WiFi II LGA 1700
- ๐ง RAM: 128GB DDR5
- ๐ฎ GPU: NVIDIA RTX 4090
- ๐พ SSD: 1TB (Samsung 980 Pro)
- ๐ Network: Upgraded to 10G (previously using the built-in 2.5G Ethernet on the mainboard)
The move to 10G networking aligns perfectly with me that like faster and more reliable connections. Also I did have 40G Router in my homelab.
AI Models in Use
With my current setup, I utilize high-performance AI models tailored for different use cases:
Large-Scale Tasks and Experimentation
- ๐ Model: Llama 3.2 Vision (90B Parameters)
- ๐ Details: This model relies heavily on my large RAM, providing moderate inference speeds. It excels at resource-intensive tasks and advanced experimentation.
Daily Productivity
Text-Only Models
- ๐๏ธ Model: Llama 3.3 (70B Parameters)
- ๐ Details: Ideal for advanced natural language processing tasks, this model delivers robust and reliable performance for daily use.
Multimodal Models
-
๐ผ๏ธ Model: Llama 3.2 Vision (11B Parameters)
- ๐ Details: Striking a balance between performance and efficiency, this model is excellent for day-to-day multimodal processing.
-
๐ Model: InternVL2 (26B Parameters)
- ๐ Details: With advanced vision-language capabilities, this model excels at complex multimodal tasks while maintaining efficiency for regular use.
AI Deployment
I deploy my AI projects using Ollama.
Installing Ollama on Fedora 41
-
๐ Update Fedora:
- Keep your system updated:
sudo dnf update -y
- Keep your system updated:
-
๐ ๏ธ Install Prerequisites:
- Install essential build tools and libraries:
sudo dnf install -y gcc make cmake git curl wget
- Install essential build tools and libraries:
-
๐ฎ Install NVIDIA Drivers:
- Open the Software Center.
- Search for "NVIDIA drivers" and install the appropriate ones for your GPU.
- Follow the guided steps for enabling Secure Boot if necessary.
-
๐ Set Up Docker (Optional):
- For containerized environments, install Docker:
sudo dnf install -y docker sudo systemctl start docker sudo systemctl enable docker
- For containerized environments, install Docker:
-
๐ฅ Download and Install Ollama:
- Visit the Ollama website for the latest version compatible with Fedora.
- Use the terminal for installation:
curl -fsSL https://ollama.ai/install.sh | sh
-
โ Verify Installation:
- Check the installation:
ollama --version
- Test a model:
ollama run test-model
- Check the installation:
By following these steps, I successfully set up Ollama on Fedora 41, ensuring smooth operation with my NVIDIA RTX 4090 GPU.
Model Recommendations
Small Models (<1B Parameters)
- SmolLM: 135M, 360M
- Qwen2.5: 0.5B
Medium Models (1B - 3B Parameters)
- Llama 3.2: 1B & 3B
- Qwen2.5: 1.5B & 3B
Sweet Spot Models (~7B Parameters)
These models are ideal for most modern systems:
- Llama 3.1: 8B (slightly above 7B but noteworthy)
- Mistral 7B
- Ministral 8B 24.10: Successor to Mistral 7B
- Qwen2.5: 7B
- Qwen2-VL-7B: Leading multimodal model in this range
- Zephyr-7b-beta: Fine-tuned from Mistral 7B
Large Models (13B Parameters)
For advanced tasks requiring higher specifications:
- Llama 3.2 Vision: 11B (my go-to multimodal model)
- Pixtral-12B-2409: Multimodal model by Mistral AI
- StableLM 2: 12B
- Qwen2.5: 14B
Advanced Models (20B+ Parameters)
Coding Assistants
- Qwen2.5-Coder: 32B
- Deepseek-coder-v2: 16B (base) or 67B (advanced). The 236B version is impractical for most hobbyists.
General Use
- Llama3.3: 70B
- Qwen2.5: 72B
- Hermes3: 70B
- Sailor2: 20B (specialized for Southeast Asia)
Math & Calculation
- Command-R: 35B
- Deepseek-llm: 67B (also excellent for coding tasks)
Additional Notes
- Moondream: 1.8B (a small vision model)
- Llava: 13B (previously my go-to multimodal model)
Models I Aspire to Run Locally
- DeepSeek V2.5: 236B
- Mistral Large 24.11: 123B
- Zephyr Orpo: 141B
Running models with 20B+ parameters often caters to geeks or enterprise-grade AI solutions, demanding robust hardware and significant resources.