Implement your Data Access Layer with Combine

Write robust and maintainable software using modern language features

Aug 15, 2023

Now we can realise the beauty of a Data Access Layer - we can use any and all of these approaches to fetching data, and synthesise a result with the most up-to-date data, as quickly as possible.

We're in the home stretch now. In the penultimate chapter in our async testing odyssey, we're doing something a bit different to prepare for the grand finale.

Part I Dependency Injection Demystified
Part II Mocking like a Pro
Part III Unit Testing with async/await
Part IV Advanced async testing: Unstructured concurrency
Part V (interlude) - Implement your Data Access Layer with Combine
Part VI Combine, async/await, and Unit Testing

Part V: (interlude) - Implement your Data Access Layer with Combine

Apple released Combine in 2019. The framework underpins much of the reactive nature of SwiftUI, but is also a powerful tool in its own right. We're going to take a break from testing in this chapter, and I'm going to get us all up to the same level of understanding in preparation for Part VI - Combine, async/await, and Unit Testing.

Today, we're going to:

Explore the benefits of a data access layer.
Learn all about the Repository pattern.
Gain an introduction to the basics of the Combine framework.
Finally, we'll pull these concepts together to show you how you can implement your own data access layer with Combine.

The Data Access layer

Architecture Overview

Let's revisit our nifty architecture diagram for Bev from Part I - Dependency Injection Demystified. In our modular architecture, the key layers are:

UI / Presentation Layer - to display UI and handle events
Data Access Layer - to define what data we want to get
Network Layer - to deal with how we get this data

High level architecture diagram for Bev, showing the direction of data flow from events (i.e. user actions), down through the data access layer, to a network request and the wider internet, back up the layers, to update the UI with model data

Data access as an abstraction

The Data Access Layer is an abstraction to make life easy for the UI layer above it.

The interface of the data access layer could be as simple as this:

public protocol DataAccessLayer {
    func getDataModels() async throws -> [Models]
}

The data access layer is an interface that promises, among other things, to give you some data. The key is that it doesn't tell you how it is getting the data you want - it's on a need-to-know basis, and the consumer of the API doesn't need to know.

The innards of the Data Access Layer are non-public; the nuts and bolts are an implementation detail, which the UI layer (and, perhaps, your front-end developer) is freed from worrying about.

Implementation details

Inside the Data Access Layer, however, we do care about the implementation details for fetching the data. This is where it might get interesting - there are lots of ways we might want to retrieve data in an app:

Fetch from network
- Makes HTTP request to internet
- Very slow speed (0.01 to 10 seconds)
Fetch from local persistence
- Reads from disk
- Medium speed (0.1 to 1 milliseconds)
Fetch from local cache
- Reads from RAM
- Very fast speed (10 to 100 nanoseconds)

Now we can realise the beauty of a Data Access Layer - we can use any and all of these approaches to fetching data, and synthesise a result which gives the consumer of our API the most up-to-date data, as quickly as possible.

We might first check our local cache for the data we want, to see if we can deliver it instantly. If it's not already here, we check our local persistence. Finally, as a last resort, we can fetch data from the network. While it's by far the slowest method, an HTTP call ensures the data returned to the user is up-to-date. We can then persist and cache the data for fast retrieval later.

Retrieval strategies

We can even perform all these fetches simultaneously and generate a single result to return to the UI layer. We can update our interface to allow consumers to define a retrieval strategy such as:

Returning any data as quickly as possible
Returning only the most up-to-date data from the network, but using local storage as a fallback
Returning data as fast as possible, but potentially returning more than once (e.g. returning data from the cache, then from the network once it returns)

Here's a basic approach for implementing the interface with this strategy:

public enum DataAccessStrategy {
    case fastestAvailable
    case upToDateWithFallback
    case returnMultipleTimes
}

public protocol DataAccessLayer {
    func getDataModels(strategy: DataAccessStrategy) async throws -> [Models]
}

The Repository pattern

The Repository pattern is an approach for implementing a Data Access Layer in your app. The core idea is to manage data access logic in a centralised location. This means a Repository can collect together multiple data access layers with different underlying implementations - for instance, we might fetch from an in-memory cache, a persistence layer, or over the network.

The resulting abstraction allows our Repository interface to return an approximation to in-memory objects, which your UI layer can handle with no trouble at all.

This is only an approximation because while we would love to instantly return in-memory objects every time, sometimes things take longer than expected or go wrong.
Our interface - the API contract we are promising consumers - is hence marked async throws. async ensures the worst-case scenario for retrieval speed over the network is accounted for, allowing the Swift runtime to suspend execution at the call site, and throws ensures that consumers know they should plan to handle potential errors.

In this example we will implement both an in-memory cache and a network store, but perhaps I'll upgrade this with a persistence layer and retrieval strategies once I start digging into SwiftData.

Implementing our Repository with Combine

Combine - what you need to know

Combine is a functional reactive programming framework released by Apple in 2019; which provides a declarative API for processing values asynchronously. Much of SwiftUI uses the Combine framework as an implementation detail, most notably @Published properties you put in your view models.

To avoid massively increasing the scope of this already quite unwieldy 6-parter, read this SwiftLee article for an overview of the basics.

The primary concept we're going to use is CurrentValueSubject, which does two things:

1. Broadcasts a notification to all its subscribers when its value is updated (just like it's sibling, the PassthroughSubject)

2. Stores the most recently broadcast value.

You might already start to see how we could utilise this - #2 gives us an in-memory cache out-of-the-box!

Repository Interface

Let's go back to Bev. We've used Combine in the Repository module to implement our data access layer via the BeerRepository interface:

public protocol BeerRepository {
    var beersPublisher: CurrentValueSubject<LoadingState<[Beer]>, Never> { get }
    func loadBeers() async
}

Our BeerRepository has the async loadBeers() method with whom, after our journey through Part IV, we should all be very much acquainted. It also has a beersPublisher property which exposes a public getter for the CurrentValueSubject, allowing API consumers to read the value and hence subscribe to its broadcasted values via the sink subscriber.

One thing to appreciate here - this interface is extremely minimal. We can wrap all kinds of complex data access business logic underneath and the UI layer is none the wiser!

If we wanted to add writes so we can update Beers, we would only need to add something like write(beers: [Beer]) async throws to the interface and trust the interface to re-publish the updated values via the CurrentValueSubject after writing.

LoadingState here is a simple wrapper enum that works like a souped-up Result type - the idle and loading cases let the API users know whether we are waiting for a value or not, as well as returning the success state with the desired values or any errors:
public enum LoadingState<T> {
    case idle
    case loading
    case success(T)
    case failure(Error)
}

Repository Implementation

Since we aren't implementing persistence or any clever retrieval strategies here, the full implementation of our BeerRepository is pretty brief:

public final class BeerRepositoryImpl: BeerRepository {

    // 1
    public private(set) var beersPublisher = CurrentValueSubject<LoadingState<[Beer]>, Never>(.idle)

    // 2
    private let api: BeerAPI

    // 3
    public init(api: BeerAPI = BeerAPIImpl()) {
        self.api = api
    }

    // 4
    public func loadBeers() async {
        // 5
        beersPublisher.send(.loading)

        do {
            // 6
            let beers = try await api.getBeers()
            beersPublisher.send(.success(beers))

        } catch {
            // 7
            beersPublisher.send(.failure(error))
        }
    }
}

Let's step through each piece of this in turn:

To start with, we add the beersPublisher property to ensure protocol conformance. We mark this as public private(set) since we need it available to any API consumers outside this module, but don't want anything outside this class to modify the instance of the CurrentValueSubject we might subscribe to elsewhere. Since it's a LoadingState, we initialize it with .idle so consumers know there's nothing to see here to start with.
We have an API dependency which is kept private - it's an implementation detail which consumers of our interface don't need to know about.
Our initialiser is public so consumers in other modules can instantiate instances, and offers both a BeersAPIImpl() instance to use by default or the ability to override and inject a mock version of the API dependency.
We complete our protocol conformance with the async loadBeers() method.
When loading begins, we send a .loading state to the beersPublisher. This value is broadcast to all its subscribers, enabling our UI layer to show a loading indicator while the user waits.
We ask our API dependency to fetch some beer data over the network. If this works, we can then send the array of Beers to our publisher, which broadcasts the values wrapped in a .success state.
Finally, we handle the unhappy path. Here, we're simply passing the error to the consumer of our interface to handle, again wrapped in a .failure state. The UI layer checks for these errors and handles them in a way that's helpful to the user.

Consuming the values

In our BeerViewModel, we have a simple subscription set up to read the values broadcast from the publisher.

@MainActor
final class BeerViewModel: ObservableObject {

    // ...

    // 1
    private var cancelBag = Set<AnyCancellable>()    

    // 2
    private let repository: BeerRepository

    // 3
    init(repository: BeerRepository = BeerRepositoryImpl()) {
        self.repository = repository
        setupBeerListener(on: repository)
    }

    // 4
    private func setupBeerListener(on repo: BeerRepository) {
        repo.beersPublisher
            .receive(on: RunLoop.main)
            .sink(receiveValue: { [weak self] in
                self?.handleBeer(loadingState: $0)
            }).store(in: &cancelBag)
    }

    // ...
}

Let's briefly go over the main moving parts here:

Our cancelBag collects all the subscribers
Our repository is a private property here
With the same approach used in our Repository, the initializer here takes a repository as an argument to allow easy DI during our tests; and instantiates our standard BeerRepositoryImpl in the default argument. The initializer calls setupBeerListener(on: repository).
This private method takes the beersPublisher property on the repo and sets up a subscription to it. It's received on the main RunLoop to ensure it's thread-safe, then the value is sinked and we handle the LoadingState in another method. Finally, the subscription is stored in the cancelBag.

Advantages of this approach

The Combine-based approach is beneficial because when sharing the Repository among multiple potential consumers - that is, modules and view models in your app - the subscriptions you create allow the most up-to-date value to be broadcast everywhere it is needed automatically, every time it loads. It also allows for a logical separation of loading the data and of handling the result.

Almost as a side-effect, you get caching for free via CurrentValueSubject - but you may want to create a separate cache that doesn't get wiped when setting the state to loading, and utilising PassthroughSubject instead.

Conclusion

In this article I hope I've successfully evangelised the benefits of using a Data Access Layer in your own applications to separate the concerns of the data you need to get; and how to get this data. You might have a better idea of how you could utilise this to make your own APIs easier to consume, as well as how to implement retrieval strategies to allow your UI layer to optimise for speed, correctness, or both. Finally, you gained a brief overview of Combine and how you might use it to implement your own Data Access Layer using the Repository pattern.

If you're reading everything sequentially, I hope this article was a nice change of pace away from testing. It was a very purposeful segue however, because the next chapter is all about how to unit test your project when using Combine and async/await together in the same codebase.

We're finally ready to bring all our learning together in Part VI - Combine, async/await, and Unit Testing.

Part I Dependency Injection Demystified
Part II Mocking like a Pro
Part III Unit Testing with async/await
Part IV Advanced async testing: Unstructured concurrency
Part V (interlude) - Implement your Data Access Layer with Combine
Part VI Combine, async/await, and Unit Testing

Jacob’s Tech Tavern

Discussion about this post