TinyLLM

A Framework for Training and Deploying Language Models at the Edge
Version v0.1
Updated
Author Savitha Viswanadh Kandala
Pramuka Medaranga
Ambuj Varshney
License Apache-2.0

Introduction

Language models exhibit numerous capabilities that are suitable for embedded sensing. As an example, to interact with the end-users, analyse sensor informnation. However, their large parameter sizes impose significant memory and processing requirements for inference. This makes on-device inference challenging for mobile and edge platforms, often necessitating remotely hosted models accessed via network calls. Remote inference, in turn, introduces challenges such as latency, unreliable network connectivity, and also privacy concerns with sharing sensitive sensor data with third-party.

We hypothesize that when trained with carefully curated datasets and fine-tuned appropriately, language models (30–120M parameters) achieve comparable accuracy to larger models while remaining feasible for deployment on the most resource-constrained edge platforms. To this end, we present a framework that enables training such foundational models, offering both high token generation rates and accuracy. Furthermore, the framework simplifies the deployment of these tiny language models at the edge.

Language Models on Edge Devices offer numerous capabilities for embedded sensing. They enable interaction with devices, analysis of sensor data, localization of robots, and much more. TinyLLM simplifies the training and deployment of these tiny models!

Overview

TinyLLM: You provide data, and TinyLLM generates a custom-trained tiny language model tailored for your embedded sensing application. TinyLLM also simplifies their deployment on resource-constrained platform!

TinyLLM offers the following functionality to the end-users:

1.Dataset Preparation: Users can prepare their own sensing datasets for model pre-training. The framework provides tools to preprocess the data and convert it into a format suitable for pre-training.

2.Pre-Training: Users can train custom language models using their own datasets. The framework builds upon the architecture of GPT-2 foundational models with varying parameter sizes (30M, 51M, 82M, 101M, and 124M) as a starting point for training custom models.

3.Fine-Tuning: Users can fine-tune custom or off-the-shelf pre-trained models with domain-specific data for downstream embedded sensing tasks.

4.Deployment: Users can deploy trained models on even the most constrained edge devices, such as SBCs like Raspberry Pi and Orange Pi. The framework includes provisions to convert trained models into formats optimized for edge device deployment.

Result Highlights

TinyLLM has been used to train and fine-tune small custom models for two downstream tasks: Hand Gesture Recognition and Robot Localization using in-house datasets. The accuracy of these models was compared to that of popular off-the-shelf models:

Usecase 1: Hand Gesture Recognition

Model # Parameters (Billion) Model Size (GiB) Accuracy (%)
Tiny Model 0.10 0.277 98.44
Tiny Model 0.12 0.329 93.02
Microsoft Phi 3 3.82 7.2 100
Meta Llama 3 8.03 15 93

Usecase 2: Robot Localisation

Model # Parameters (Billion) Model Size (GiB) Accuracy (%)
Tiny Model 0.08 0.231 98.57
Tiny Model 0.12 0.329 100
Microsoft Phi 3 3.82 7.2 100
Meta Llama 3 8.03 15 100

The results demonstrate that the custom models trained using TinyLLM either outperform or perform on par with the off-the-shelf models in terms of accuracy. Furthermore, these custom models are significantly smaller in size, making them ideal for deployment on edge devices.

Note: These models were pre-trained on a mix of general IoT sensor data before being fine-tuned for specific downstream tasks.

How to get started?

Follow the instructions provided in the GitHub repository.

Models

Pre-trained models trained on mix of sensor and web data, are available for download on our Hugging Face page.

Publications

1.Savitha Viswanadh Kandala, Pramuka Medaranga, Ambuj Varshney. 2024. Technical Report. TinyLLM: A Framework for Training and Deploying Language Models at the Edge Computers

1.Savitha Viswanadh Kandala and Ambuj Varshney. 2024. Your Data, Your Model: A Framework for Training and Deploying Foundational Language Models for Embedded Devices In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom ’24). Association for Computing Machinery, New York, NY, USA, 1704–1706. https://doi.org/10.1145/3636534.3697466

1.Savitha Viswanadh Kandala and Ambuj Varshney. 2024. A Framework for Training and Deploying Foundational Language Models for Embedded Sensing . In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom ’24). Association for Computing Machinery, New York, NY, USA, 2236–2238. https://doi.org/10.1145/3636534.3695901