Collaborative-Edge-LAM

Collaborative Edge LAM Framework: A Python Simulation

This repository contains a Python simulation of the framework presented in the research paper: “Collaborative Deployment of Large AI Models on the Edge: A Microservice Approach to Heterogeneous Training and Quantized Inference”.

This project provides a functional, high-level implementation of the core architectural components and algorithms, demonstrating how to manage the lifecycle of Large AI Models (LAMs) in resource-constrained and heterogeneous edge computing environments.

License: MIT Python 3.8+


Table of Contents

Problem Statement

Deploying Large AI Models (LAMs) like GPT-4 and Gemini on real-time Internet-of-Things (IoT) devices is a significant challenge. This is due to the severe mismatch between the massive computational and memory requirements of LAMs and the limited resources of edge devices. The problem is compounded by deep system heterogeneity, where devices vary widely in computational power (e.g., high-end GPUs vs. low-power MCUs) and supported numerical precisions (e.g., FP32 vs. INT8).

This framework provides a unified, modular, and adaptive solution to overcome these barriers.

Key Features

This simulation implements the two core innovations presented in the paper:

System Architecture

The framework is coordinated by a central Edge Server that manages two synergistic workflows: Collaborative Training and Dynamic Inference.

+-------------------------------------------------------------------------+
|                          EDGE SERVER (Orchestrator)                     |
|                                                                         |
|  +---------------------------+               +------------------------+ |
|  |   Heterogeneity-Aware     |   Global LoRA   |   Precision-Aware      | |
|  |      Aggregation          |<--------------->|    Inference Orchestrator| |
|  | (Algorithm 1)             |     Update      | (Algorithm 2)          | |
|  | - De-Quantization         |               | - Lyapunov Optimization| |
|  | - Adaptive Rank Aggregation |               | - QoS-based Scheduling | |
|  +---------------------------+               +------------------------+ |
|        ^             |                                    |              |
| Uplink |             | Downlink                           | Deployment   |
| (LoRA) |             | (LoRA)                             | Results      |
+--------|-------------|------------------------------------|--------------+
         |             |                                    |
         v             v                                    v
+-------------------------------------------------------------------------+
|                      HETEROGENEOUS EDGE NETWORK                         |
|                                                                         |
|  [COLLABORATIVE TRAINING]                     [DYNAMIC INFERENCE]       |
|                                                                         |
| +------------+  +------------+             +-------------+  +-----------+ |
| | Device 1   |  | Device 2   |             | Device 1    |  | Device 3  | |
| | (H-Tier)   |  | (M-Tier)   |             | (H-Tier)    |  | (L-Tier)  | |
| | r=16, FP16 |  | r=8, FP16  |             | m1_FP16     |  | m2_INT8   | |
| +------------+  +------------+             +-------------+  +-----------+ |
|                                                                         |
| +------------+  +------------+             +-------------+  |           | |
| | Device 3   |  | Device 4   |             | Device 2    |  |           | |
| | (L-Tier)   |  | (M-Tier)   |             | (M-Tier)    |  |           | |
| | r=4, INT8  |  |            |             | m3_FP16     |  |           | |
| +------------+  +------------+             +-------------+  +-----------+ |
+-------------------------------------------------------------------------+

Project Structure

The codebase is organized into a modular structure to separate concerns.

collaborative_edge_lam/
├── main.py                     # Main script to run all simulations
├── config.py                   # Central configuration for devices, model, network
└── framework/
    ├── __init__.py
    ├── server.py               # Implements the EdgeServer orchestrator logic
    ├── device.py               # Implements the EdgeDevice client logic
    ├── microservice.py         # Defines microservice variants and the portfolio
    ├── lora.py                 # Helper data class for LoRA updates
    └── utils.py                # Logging and other utility functions

Setup and Installation

This project uses only standard Python libraries (NumPy) and requires no special installation.

  1. Clone the repository:
    git clone https://github.com/ahmadpanah/Collaborative-Edge-LAM.git
    cd Collaborative-Edge-LAM
    
  2. Ensure you have Python 3.8+ installed.

  3. (Optional but recommended) Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    

How to Run the Simulation

To run all three experiments described in the paper, simply execute the main.py script:

python collaborative_edge_lam/main.py

The script will print a detailed log of the simulation process and a summary of the results for each experiment, comparing our framework against the baselines.

Core Concepts Implemented

1. Heterogeneity-Aware Federated Training (FedFT-H)

This workflow, detailed in Algorithm 1 of the paper, allows devices of varying capabilities to train a model together.

2. Precision-Aware Microservice Inference

This workflow, detailed in Algorithm 2, enables efficient, parallelized inference.

3. Efficient Federated Unlearning

Simulation Output

Running the main.py script will produce an output similar to the following, summarizing the results of the three experiments.

--- Training Performance Summary (after 50 rounds as per paper) ---
Training Strategy           Final Accuracy (%)        Total Comm. Cost (GB)
--------------------------------------------------------------------------------
Full-Model FedAvg           85.5%                     ~280.0 GB
Naive Fed-LoRA (r=4)        72.1%                     ~5.1 GB
Ours (FedFT-H)              84.9% (Simulated)         ~5.3 GB
FedFT-H achieves near full-model accuracy with >98% communication savings.

--- Inference Performance Summary (4-Step CoT Task) ---
Deployment Strategy         Avg. Latency (ms)         Memory Footprint
--------------------------------------------------------------------------------
Cloud-Centric               850.4                     N/A (Cloud-Side)
Monolithic Edge             400.0                     14.5 GB (on one device)
Ours (Microservice)         172.0                     4.2 GB (Total Active)
Our microservice approach shows a ~57.0% latency reduction through parallelization.

--- Federated Unlearning Efficacy & Cost Comparison ---
State / Method              Model Accuracy (%)        Time Cost
--------------------------------------------------------------------------------
Original Trained Model      84.9%                     N/A
Full Retraining             84.5%                     ~5 Hours
Ours (Orthogonal Unlearn)   84.3% (Simulated)         ~0.01 Seconds
Our unlearning method is orders of magnitude faster than retraining with minimal accuracy loss.

Contributing

Contributions are welcome! If you have ideas for improvements, please open an issue to discuss what you would like to change. Pull requests are also appreciated.


This project is licensed under the MIT License.