Using automatic API request retries make iOS apps more resilient

This technique is demonstrated within the context of a SwiftUI application, but the real topic of discussion is the NetworkClient singleton used by the SwiftUI app. There's really no difference between how this technique applies for UIKit or AppKit, so it should still be relevant.

App Architecture

The overall app for this post has the following components:

  1. A front-end UI built in SwiftUI, which is found in ContentView.swift
  2. A ViewModel which provides responsive updates to the UI, found in ContentViewModel.swift
  3. A singleton service that fetches data from an API, found in NetworkClient.swift.

The main focus for this discussion is the design of the NetworkClient singleton.  In this simple application, NetworkClient's only job is to fetch a current weather sample from a backend service, and deserialize it to a Weather object (defined in Weather.swift), and return it to the UI for display.

Note that in order to keep sample code simple and easy to review, the code in this sample project lacks what I'd call "reasonable error handling". A production-ready application would provide more error checking and provide better error messages.

Fetching Data with URLSession (without retries)

A simple web service to meet the needs of this demo app might look as follows.

class NetworkClient {
    class var sharedInstance:NetworkClient {
        struct SingletonWrapper {
            static let singleton = NetworkClient()
        }
        return SingletonWrapper.singleton
    }
    
    let session = URLSession(configuration: 
                      URLSessionConfiguration.default, 
                      delegate: nil, delegateQueue: nil)
    
    // MARK: - Fetching without retry
    func fetchWithoutRetry(url: String, 
                    completion: @escaping (_ weather: Weather?, 
                                            _ error: String?) -> Void) {
                                            
        var request = URLRequest(url: URL(string: url)!)
        request.httpMethod = "GET"
        
        let task = session.dataTask(with: request) { 
                                      (data, response, error) in
            let statusCode = 
                  (response as? HTTPURLResponse)?.statusCode ?? -1
            
            if let error = error {
                completion(nil, error.localizedDescription)
                return
            }
            
            if (200...299).contains(statusCode), 
                                          let data = data {
                
                let weather = try? JSONDecoder()
                                      .decode(Weather.self, from: data)
                completion(weather, nil)
            } else {
                completion(nil, "Error encountered: \(statusCode)")
            }
        }
        task.resume()
    }
}

This NetworkService provides only one method: fetchWithoutRetry(..). The method makes a connection to a web service endpoint, GETs a JSON response, and deserializes is into a Weather object. Once deserialized, the Weather object is return asynchronously via a closure to the UI.

The main job of the* NetworkService.fetchWithoutRetry* function is to deserialize weather data into this struct for the app to use as a source for the UI:

struct Weather : Decodable {
    let name: String
    let temp: Double
    let feelsLike: Double
    let tempMin: Double
    let tempMax: Double
    let pressure: Double
    let humidity: Double
    let visibility: Double
    let windSpeed: Double
    let windDirection: Double
}

In the event of error, or if the http response code isn't in the 200-299 range, an error message is displayed.

If the sample application is run with this version of the NetworkClient, the below is the output.

  • On the left is the output when a valid URL is requested from the web service. The current weather temperature is fetched and displayed on screen normally.
  • On the right is the output when an invalid URL is requested. The first request fails with a 404 (not found) error, and a message indicating the response code is displayed.

Not bad, but we can do better

This version of the Network service is fine, and if the server never experiences issues responding to requests, and if the client cellular network never has a glitch that causes a request to fail unexpectedly, then all is well.

But all isn't always well. Unexpected and intermittent issues can arise:

  • A temporary service outage may be occurring in the backend as our request is dispatched
  • A load balancer may hit a dead node on the backend
  • A cellular connection could be spotty on the client side (user is inside a building, at a football game, etc.)

With mobile apps, it's best to plan for failure, and handle them as gracefully as possible--ideally without user awareness.

The error message is helpful, and the user can press the "Network Fetch" button to retry. So this design is "pretty good". But we can make it better.

What if we magically pressed the retry button *for *the user behind the scenes, assuming that when an error is encountered, one of the above *temporary *disruptions is the root cause. Perhaps a retry (or two...or three) may clear the issue without disrupting the user's experience and distracting user attention from the main flow of the app.

NetworkClient with Retries

The following alternative service adds automatic retries to the NetworkClient service. It's a bit longer, but if you study it carefully, it's not fundamentally different.

class NetworkClient {
  class var sharedInstance:NetworkClient {
      struct SingletonWrapper {
          static let singleton = NetworkClient()
      }
      return SingletonWrapper.singleton
  }
  
  let session = URLSession(configuration: 
                    URLSessionConfiguration.default, 
                    delegate: nil, delegateQueue: nil)

  // MARK: - Fetching with retry
  
  func fetchWithRetry(url: String, 
                  completion: @escaping (_ weather: Weather?, 
                                        _ error: String?) -> Void) {
                                        
      var request = URLRequest(url: URL(string: url)!)
      request.httpMethod = "GET"
      
      requestWithRetry(with: request) { 
                    (data, response, error, retriesLeft) in
                    
          if let error = error {
              completion(nil, "\(error.localizedDescription) with \(retriesLeft) retries left")
              return
          }
          
          let statusCode = (response as! HTTPURLResponse).statusCode
          
          if statusCode == 200, let data = data {
              let weather = try? JSONDecoder().decode(
                                    Weather.self, from: data)
              completion(weather, nil)
          } else {
              completion(nil, "Error encountered: \(statusCode) with \(retriesLeft) retries left")
          }
      }
  }
  
  // **** This function is recursive, and will automatically retry
  private func requestWithRetry(with request: URLRequest, 
                                retries: Int = 3,
                                completionHandler: @escaping 
                                      (Data?, URLResponse?, Error?, 
                                        _ retriesLeft: Int) -> Void) {
      
      let task = session.dataTask(with: request) { 
                                  (data, response, error) in
          if error != nil {
              completionHandler(data, response, error, retries)
              return
          }
          
          let statusCode = (response as! HTTPURLResponse).statusCode
          
          if (200...299).contains(statusCode) {
              completionHandler(data, response, error, retries)
          } else if retries > 0 {
              print("Received status code \(statusCode) with \(retries) retries remaining. RETRYING VIA RECURSIVE CALL.")
              self.requestWithRetry(with: request, 
                            retries: retries - 1, 
                            completionHandler: completionHandler)
          } else {
              print("Received status code \(statusCode) with \(retries) retries remaining. EXIT WITH FAILURE.")
              completionHandler(data, response, error, retries)
          }
      }
      task.resume()
  }    
}

The main difference is the introduction of the requestWithRetry function.  This function is recursive, meaning it will call itself in the event of a network response error.

How many times it will call itself recursively? This is controlled by the original fetchWithRetry function via the retries parameter. It could be that some requests should not retry at all (retries=0), or only once (retries=1), or many times. This is configurable and all requests don't have to follow the same rules.

If requestWithRetry encounters an error, and the retry count is > 0, it calls itself while decrementing the retry counter. If it encounters an error with retry count ==  0, it gives up and returns the error information back to *fetchWithRetry--*as the original version of the service did.

If a success occurs during any retry, the calling app will be unaware that there was any issue during the automatic retry sequence.

A peek at errors logged during retry

In the revised code (with retry), there are some print() statements to trace what's happening in the fetchWithRetry -> requestWithRetry sequence.  The test app is passing an invalid URL for the error test, so no matter how many retries occur, the end result will be a 404/Not Found error from the web server.

The following illustrates what happens within the NetworkService as the repeated retries fail.

Received status code 404 with 3 retries remaining. RETRYING VIA RECURSIVE CALL.

Received status code 404 with 2 retries remaining. RETRYING VIA RECURSIVE CALL.

Received status code 404 with 1 retries remaining. RETRYING VIA RECURSIVE CALL.

Received status code 404 with 0 retries remaining. EXIT WITH FAILURE.

However, in a real app if one of the retries succeeded--say the 2nd or even 3rd--the valid JSON response would have been deserialized and returned up through the call stack to the UI.  The user may have noticed a slight delay, but otherwise would be unaware of the issue.

Front-end experience with retry failures

Here's what the front-end user experience is with the updated web service.

  • The left is no different--the service returned a 200 response. We don't know how many retries it took to get that response. It may have been successful on the 1st, 2nd or 3rd try.
  • The right shows that even with 3 tries, we may still get an error. The app is updated to show that it did retry 3 times before (finally) returning the 404 error.

Conclusion

Since mobile apps–by definition–are used over network connections with varying levels of quality, making them resilient to intermittent network disruptions is an important technical design goal.

This design pattern--automatic retry–is one technique to help keep users engaged and less distracted as they user our apps.

Of course, automatic retries isn't a fix-all.  Servers can be so saturated that all retries fail; user network connections can be so poor that requests simply can't get to the server or back to the client. But attempting auto-recovery is one of many tools in the box to help keep technical issues in the background of the user experience rather than the foreground.

For source code for the sample application, visit my GitHub repository.