Overview

Felafax is a framework for continued-training and fine-tuning open source LLMs using XLA runtime. We take care of necessary runtime setup and provide a Jupyter notebook out-of-box to just get started.

Easy to use.
Easy to configure all aspects of training (designed for ML researchers and hackers).
Easy to scale training from a single TPU VM with 8 cores to entire TPU Pod containing 6000 TPU cores (1000X)!

Our goal at felafax is to build infra to make it easier to run AI workloads on non-NVIDIA hardware (TPU, AWS Trainium, AMD GPUs, and Intel GPUs).

✨ Finetune for Free

Add your dataset, click “Run All”, and you’ll run on free TPU resource on Google Colab!

Llama 3.1 (1B, 3B)

Start for free on Google Colab TPU

Currently Supported Models

LLaMa-3.1 JAX Implementation

Our flagship implementation offering maximum performance and scalability across hardware platforms.

Core Features

• Ported to JAX from PyTorch • Supported Models: 1B, 3B, 8B, 70B and 405B • Training Modes: Full-precision, LoRA

Key Benefits

• Run efficiently across wide range of hardware: TPUs, AWS Trainium, NVIDIA, and AMD • Scale seamlessly across multiple accelerators • Hardware-optimized XLA backend

LLaMa-3/3.1 PyTorch XLA

Meta’s PyTorch implementation with XLA support for TPU compatibility.

Features

• Architecture: PyTorch XLA• Training Modes: Full-precision, LoRA• Codepointer

Get Started

AWS Trainium Guide

✨ Finetune for Free

Llama 3.1 (1B, 3B)

Currently Supported Models

LLaMa-3.1 JAX Implementation

Core Features

Key Benefits

LLaMa-3/3.1 PyTorch XLA

Features

Get Started

AWS Trainium Guide

​✨ Finetune for Free

Llama 3.1 (1B, 3B)

​Currently Supported Models

​LLaMa-3.1 JAX Implementation

Core Features

Key Benefits

​LLaMa-3/3.1 PyTorch XLA

Features

✨ Finetune for Free

Currently Supported Models

LLaMa-3.1 JAX Implementation

LLaMa-3/3.1 PyTorch XLA