Degree Name
MS (Master of Science)
Program
Computer Science
Date of Award
5-2026
Committee Chair or Co-Chairs
Brian T. Bennett
Committee Members
Shehenaz Shaik, Mathew Desjardins
Abstract
The advancement of Large Language Models (LLMs) has fundamentally changed the nature of natural language processing. The substantial memory requirements of frontier models creates a significant barrier to entry, centralizing inference. This thesis presents the design and implementation of a distributed inference framework designed to democratize LLMs by leveraging commodity devices. The framework combines the resources of heterogeneous COTS devices into a unified compute pool, enabling the inference of models exceeding a single device's memory capacity. A novel Task Partitioning Engine (TPE) analyzes model architectures, profiles node capabilities, and supports pipeline and expert parallelism strategies. The primary contribution is a fault tolerance mechanism which detects node failures during active inference via heartbeat monitoring, automatically recovers lost model shards, and resumes generation from the point of failure with zero token loss. Evaluation on a heterogeneous cluster demonstrates successful distributed inference across heterogeneous devices and validates mid-inference recovery following node failure.
Document Type
Thesis - unrestricted
Recommended Citation
Dunn, Brycen E., "Durable, Distributed LLM Inference on COTS Devices" (2026). Electronic Theses and Dissertations. Paper 4666. https://dc.etsu.edu/etd/4666
Copyright
Copyright by the authors.