In recent years, the world of distributed computing has been undergoing a significant transformation. The emergence of edge computing and the Internet of Things (IoT) require new designs for distributed systems.
The Need for Edge Computing
Traditional cloud-based architectures are facing new challenges:
- Bandwidth Demands: The explosion of data-intensive applications, especially video, is straining network capacities.
- Latency Requirements: Many modern applications, such as AR/VR and autonomous vehicles, require ultra-low latency that cloud data centers can not consistently provide.
- Energy Efficiency: Moving massive amounts of data to and from remote data centers is becoming increasingly energy-intensive and costly.
- Data Sovereignty: Regulatory requirements like GDPR are mandating local data processing in many cases.
Enter edge computing - a new tier of infrastructure deployed closer to end-users and devices, designed to address these challenges.
The Edge Computing Landscape
Edge computing can take many forms:
- Micro data centers in cellular towers
- Compute resources in vehicles or portable devices
- Low-power IoT devices with basic sensing and processing capabilities
This diverse landscape creates new opportunities but also introduces new challenges for distributed system design.
Key Differences from Traditional Distributed Systems
Edge computing environments differ significantly from traditional data center-based systems:
- Scale and Geo-distribution: Edge deployments are much more geographically dispersed.
- Resource Elasticity: Unlike clouds, edge resources are not infinitely elastic.
- Heterogeneity: Edge devices have widely varying capabilities.
- Reliability: Edge networks are often less reliable than data center networks.
- Multi-tenant Nature: Edge infrastructure is more likely to be shared across multiple providers.
These differences necessitate new approaches to distributed system design and programming models.
Case Study: Transactuations for IoT
One example of how distributed computing concepts are being adapted for the edge is the concept of "transactuations" - a new programming model for IoT applications that bridges the gap between digital state and physical actuations.
Traditional distributed transactions do not work well in IoT environments because:
- Physical actuations can not be easily rolled back
- Sensing and actuation have temporal dependencies
- There is a need to maintain consistency between digital state and physical world state
Transactuations introduce new abstractions like sensing policies and actuation policies to handle these challenges, allowing developers to build more robust and consistent IoT applications.
Future developments
As edge computing and IoT continue to evolve, we can expect to see:
- New programming models and abstractions tailored for edge environments
- Enhanced security measures to protect distributed edge resources
- Improved orchestration and management tools for heterogeneous edge deployments
- Novel applications that leverage the unique capabilities of edge computing
The rise of edge computing and IoT is not just a trend - it is a fundamental shift in how we approach distributed systems.
Key Concepts
Edge Computing A distributed computing paradigm that brings computation and data storage closer to the location where it is needed, to improve response times and save bandwidth.
Internet of Things (IoT) A system of interrelated computing devices, mechanical and digital machines, objects, animals or people that are provided with unique identifiers and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.
Distributed Systems A system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
Cloud Computing The delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet ("the cloud") to offer faster innovation, flexible resources, and economies of scale.
Latency The time delay between the cause and the effect of some physical change in the system being observed. In networking, it specifically refers to any delay or waiting that increases real or perceived response time beyond the response time desired.
Bandwidth The maximum rate of data transfer across a given path. It is often used to describe the amount of data that can be transmitted in a fixed amount of time.
Data Sovereignty The concept that information which has been converted and stored in binary digital form is subject to the laws of the country in which it is located.
Micro Data Center A smaller, modular data center that can be deployed closer to the edge of a network to reduce latency and improve performance for local users.
Transactuations A programming model for IoT applications that ensures consistency between digital state and physical world actuations, incorporating concepts like sensing policies and actuation policies.
Sensing Policy In the context of transactuations, it specifies the conditions under which sensor data can be considered valid for a transaction, including time windows and required sensor availability.
Actuation Policy In transactuations, it defines the conditions under which physical actuations are considered successful, including the required percentage of successful actuations before committing a transaction.
Geo-distribution The spread of computing resources across multiple geographic locations to improve reliability, reduce latency, and comply with data localization requirements.
Resource Elasticity The ability of a system to automatically scale up or down based on demand. In cloud computing, this is often considered nearly infinite, while edge computing has more constraints.
Heterogeneity In edge computing, it refers to the diverse nature of devices and resources in terms of their capabilities, from powerful micro data centers to low-power IoT sensors.
Multi-tenant An architecture in which a single instance of a software application serves multiple customers (tenants). In edge computing, it often refers to infrastructure shared by multiple service providers or applications.
Orchestration The automated configuration, coordination, and management of computer systems and software. In edge computing, it involves managing distributed resources across various locations and capabilities.
Review Questions
In Satya et al.'s taxonomy, what are the different tiers of computing infrastructure/systems? What are the unique characteristics of these tiers?
According to Satyanarayanan et al.'s paper "The Computing Landscape of the 21st Century," there are four tiers in the computing taxonomy:
-
Tier-1 (Cloud): Characterized by elasticity, permanence, and consolidation. Cloud datacenters provide virtually unlimited compute elasticity, the safest place for data storage with high permanence, and achieve economies of scale through consolidation. They're optimal for large tasks without strict timing, data ingress volume, or privacy requirements.
-
Tier-2 (Edge/Cloudlets): Defined by network proximity to Tier-3 devices. Cloudlets are small, dispersed data centers ("data center in a box") that create the illusion of bringing Tier-1 closer to users. They enable low-latency compute offloading from mobile devices and reduce bandwidth demands to Tier-1. They're essentially the same hardware as Tier-1 but engineered differently - focusing on proximity rather than consolidation.
-
Tier-3 (Mobile Devices): Characterized by mobility and sensing. These devices (smartphones, wearables, etc.) have strict constraints on weight, size, heat dissipation, and battery life. They're rich in sensors but face a significant "mobility penalty" - a persistent performance gap compared to server hardware. They often offload computation to Tier-1 or Tier-2.
-
Tier-4 (IoT Devices): Defined by longevity and opportunism. These devices have no chemical energy source (battery) and instead harvest energy (e.g., from light or RF) to charge a capacitor, enabling brief episodes of sensing, computation, and transmission. They operate in an intermittent computing modality and require immersive proximity to Tier-3 devices, which provide the energy they harvest.
What is the motivation behind edge computing? What are some of the driving use cases? What distinguishes it from current Cloud Computing or even CDNs?
The motivation behind edge computing stems from several factors:
-
Network Latency: While originally thought to be the primary driver, this is less important for enterprise use cases today, which can tolerate latencies in the range of hundreds of milliseconds to minutes.
-
Bandwidth Constraints: The enormous volume of data generated by sensors, cameras, and other IoT devices (sometimes TBs to PBs per day) makes it impractical and cost-prohibitive to send all this data to the cloud.
-
Network Reliability: Many edge deployments are in places with unreliable connectivity to the cloud. Mission-critical applications need to function even during cloud disconnections.
Key driving use cases include:
- Business intelligence (retail, restaurants, gas stations)
- Smart cities and construction site monitoring
- Intelligent transportation (aviation, railway, road control)
- Industrial plants (oil refineries, manufacturing, agriculture)
Edge computing differs from cloud computing in that it places compute resources closer to data sources rather than centralizing them. It differs from CDNs in scale and purpose - CDNs have about 1-2 orders of magnitude fewer deployment points (thousands globally) compared to edge infrastructure like cellular towers (hundreds of thousands in the US alone). While CDNs primarily deliver content, edge computing provides general-purpose computing closer to users.
What are some assumptions of distributed systems solutions developed for datacenters, or even geo-distributed datacenters, that are no longer valid when considering edge computing? Why?
Several assumptions that work for datacenter distributed systems break down in edge computing:
-
Scale and Geo-distribution: Datacenter protocols assume tightly coupled components despite large numbers. When used at the edge, these protocols generate excessive chatter (timeouts, heartbeats) over wide-area networks, creating significant overheads.
-
Elasticity: Datacenter solutions assume nearly limitless resources. The edge is not elastic - if resources are not available at a specific edge location, the service is unavailable. Unlike in datacenters, you can not simply run workloads on another CPU elsewhere.
-
Network Reliability: Datacenter networks use reliable, high-performance technologies. Edge networks face much higher degrees of device churn, mobility, and unreliability, making datacenter fault-tolerance solutions ineffective or introducing excessive overhead.
-
Heterogeneity: Edge devices have vastly different compute and communication capabilities, while datacenter designs assume symmetry among nodes.
-
Resource Ownership: Edge resources are often operated by multiple providers, unlike datacenters typically run by a single provider.
-
Security Assumptions: Security assumptions differ significantly at the edge but are often ignored in cloud techniques.
Why are distributed transactions insufficient to perform consistent state updates in distributed IoT-based systems at the edge?
Distributed transactions are insufficient for IoT-based systems at the edge because:
-
Physical World Interactions: Traditional distributed transactions use redo or undo logs to ensure consistent execution, but this does not work for actuations in the physical environment. For example, you can not "undo" an alarm being triggered, and physical actuations can not be rolled back.
-
Dependencies with Physical World: IoT applications have complex dependencies between sensing, application state, and actuation that traditional transactions can not handle. The paper identifies three critical dependencies:
- Actuation actions dependent on sensed values
- Updates to application state dependent on sensed values
- Dependencies among updates to application state and actuation
-
Timing and Staleness: Sensor readings have temporal validity that traditional transactions do not account for.
How does the transactuation concept relate to transactions? What are some additional features that are introduced in transactuations that make it possible to build correct IoT-based distributed applications?
Transactuations build upon the transaction concept but add crucial features for IoT environments:
-
Sensing and Actuation Policies: Transactuations introduce sensing policies (determining when a transaction can execute based on sensor data freshness) and actuation policies (specifying conditions under which soft state updates can commit despite failures).
-
Time Window Validation: Transactuations specify time windows for sensor readings to be considered valid.
-
Selective Atomic Durability: Unlike transactions that ensure atomic durability for all operations, transactuations guarantee atomic durability only for soft writes, not hard writes (actuations). If a hard write fails, the transactuation can still commit by enforcing consistency between soft and hard states.
-
Failure Handling: Transactuations define explicit behaviors for success and failure cases (onSuccess and onFailure lambdas).
-
T-Chains: Transactuations can be chained together sequentially, ensuring that if a transactuation depends on another's results, they execute in the correct order.
These features enable IoT applications to properly handle sensing policies (e.g., all/any/none policies for sensor availability), actuation policies (handling failed actuations), and maintain consistency between physical state and application state.
In the evaluation of the system, why did the authors pick the specific metrics we mention in the Lesson, what did they want to demonstrate/learn about Transactuations by using these metrics?
The authors evaluated Transactuations using three primary metrics:
-
Programmability (Lines of Code): By comparing LOC across Best Effort (BE), Best Effort with Consistency (BE+Con), and Transactuation (TN) implementations, they wanted to demonstrate that Transactuations provide a more concise and developer-friendly way to build reliable IoT applications. Their results showed that TN implementations had comparable or fewer LOC than BE, despite offering stronger guarantees, and significantly fewer than BE+Con.
-
Correctness: The authors evaluated applications' behavior under failure conditions (unavailable sensors and failed actuations) to demonstrate that Transactuations could effectively handle these scenarios and maintain correct application behavior. They showed that Transactuations could either resolve issues automatically or detect them and notify users appropriately.
-
Overhead: By measuring execution times in failure-free and failure scenarios, they wanted to quantify the overhead introduced by Transactuations compared to simpler implementations. This helped establish whether Transactuations were practical for real-world deployment. They found that Transactuations introduced a moderate overhead (about 1.5x slowdown) in the failure-free case, which they deemed acceptable given the correctness guarantees provided.
These metrics collectively demonstrated that Transactuations offer a good balance of programmability, correctness guarantees, and reasonable performance overhead, making them suitable for building reliable IoT applications in edge environments.