How to scan a credit card using Apple Vision and VisionKit natively in iOS Swift with SwiftUI and UIKit

Khalid Asad
6 min readMay 14, 2021
Apple’s Vision and VisionKit are truly amazing tools for Text and Image Recognition!

After Apple released their Vision and VisionKit APIs in iOS 13 and above, it gave developers a huge amount of developmental ideas and ease of scanning images and text for vital information.

We’re going to leverage this to scan credit cards for important information!

I’ll skip through the project creation and set up phase and jump into some things we need for this to work:

  • In our info.plist file, we need to add a new row called “Privacy — Camera Usage Description” and inside that String value, put a description. For example: “Allow camera usage to obtain credit card credentials from Credit cards.”
  • Add the import statements for the files using them:
    import Vision
    import VisionKit

Now that we’ve got that out of the way. Let’s think of our black box approach.
We need to:

  1. Output some sort of structure with the strings we need.
  2. Scan a card save it as an image
  3. Create some algorithm to parse the text returned from image text recognition

Step 1 — Design an output structure

Let’s design a struct called CardDetails as this will tell us what exactly we need to parse:

public struct CardDetails {
public var number: String?
public var name: String?
public var expiryDate: String?
}

Step 2 — Create a SwiftUI View equivalent of a UIViewController conformance to VNDocumentCameraViewController

With Vision and VisionKit imported, we need to create a View Controller that will handle the reading. Since it is 2021, let’s also convert this View Controller into a SwiftUI View using UIViewControllerRepresentable.

We’ll start off by creating a struct called CardReaderView which conforms to UIViewControllerRepresentable. Let the compiler auto generate the required functions and then let’s fill them out.

For the UIViewControllerType, we’ll use VNDocumentCameraViewController which is a pre-built document scanner which recognizes shapes and converts them into images for us.

Copy the following code into your struct:

private let completionHandler: (CardDetails?) -> Voidinit(completionHandler: @escaping (CardDetails?) -> Void) {
self.completionHandler = completionHandler
}
public typealias UIViewControllerType = VNDocumentCameraViewControllerpublic func makeUIViewController(context: UIViewControllerRepresentableContext<CardReaderView>) -> VNDocumentCameraViewController {
let viewController = VNDocumentCameraViewController()
viewController.delegate = context.coordinator
return viewController
}
public func updateUIViewController(_ uiViewController: VNDocumentCameraViewController, context: UIViewControllerRepresentableContext<CardReaderView>) { }

This will create a viewController as a VNDocumentCameraViewController, assign the delegate to the coordinator, and return us a completionHandler when complete.

Now we need to create a class as a Coordinator that will interface with the UIViewControllerRepresentable protocol.

public func makeCoordinator() -> Coordinator {
Coordinator(completionHandler: completionHandler)
}
final public class Coordinator: NSObject, VNDocumentCameraViewControllerDelegate{
private let completionHandler: (CardDetails?) -> Void

init(completionHandler: @escaping (CardDetails?) -> Void) {
self.completionHandler = completionHandler
}
public func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan) {
print("Document camera view controller did finish with ", scan)
let image = scan.imageOfPage(at: 0)
validateImage(image: image) { cardDetails in
self
.completionHandler(cardDetails)
}
}
public func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
completionHandler(nil)
}
public func documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error) {
print("Document camera view controller did finish with error ", error)
completionHandler(nil)
}
}

This implements the VNDocumentCameraScanViewControllerDelegate functions and returns a completionHandler which interfaces with the other completionHandler we created above.

We have 3 delegate functions from the VNDocumentCameraScanViewControllerDelegate: didFinishWith, didCancel and didFailWithError functions which all utilize the completionHandler to return to the user.

We need to take the scan from the didFinishWith function and grab the image of the page at the first index (take the first image) and then perform a text recognition request on it.

Step 3 — Text Recognition and Custom Parsing Logic

We’ll create another function inside the Coordinator and call it validateImage.

func validateImage(image: UIImage?, completion: @escaping (CardDetails?) -> Void) {
completion(nil)
}

Inside this, we need to construct our VNTextRecognitionRequest.

var textRecognitionRequest = VNRecognizeTextRequest()
textRecognitionRequest.recognitionLevel = .accurate
textRecognitionRequest.usesLanguageCorrection = false
textRecognitionRequest.customWords = ["Expiry Date", "Good Thru"]

We can utilize some of the request variables for a more fine tuned detection.

recognitionLevel can be either fast or accurate, however I recommend accuracy over speed.

usesLanguageCorrection can be useful in many cases, however I prefer to leave it off.

customWords can help with recognition of certain words instead of mistaking them with numbers or symbols accidentally.

With that constructed, we need to generate an array of strings from the request block:

var recognizedText = [String]()textRecognitionRequest = VNRecognizeTextRequest() { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let requestResults = request.results as? [VNRecognizedTextObservation]
else { return completion(nil) }
recognizedText = requestResults.compactMap { observation in
candidiate = observation.topCandidates(1).first?.string
}
}

This will grab the request results and ensure that the top candidates have valid string values, otherwise it will return with failure.

Vision framework is a little weird with how they approach this function, we still need to perform the request and now we can return the completion handler:

let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try handler.perform([textRecognitionRequest])
completion(parseResults(for: recognizedText))
} catch {
print(error)
}

However before we return the completion handler, we still need to do some custom parsing, let’s create another function called parseResults.

func parseResults(for recognizedText: [String]) -> CardDetails {
// Credit Card Number
let creditCardNumber = recognizedText.first(where: { $0.count > 14 && ["4", "5", "3", "6"].contains($0.first) })
// Expiry Date
let expiryDateString = recognizedText.first(where: { $0.count > 4 && $0.contains("/") })
let expiryDate = expiryDateString?.filter({ $0.isNumber || $0 == "/" })

// Name
let ignoreList = ["GOOD THRU", "GOOD", "THRU", "Gold", "GOLD", "Standard", "STANDARD", "Platinum", "PLATINUM", "WORLD ELITE", "WORLD", "ELITE", "World Elite", "World", "Elite"]
let wordsToAvoid = [creditCardNumber, expiryDateString] +

ignoreList +
CardType.allCases.map { $0.rawValue } +
CardType.allCases.map { $0.rawValue.lowercased() } +
CardType.allCases.map { $0.rawValue.uppercased() }
let name = recognizedText.filter({ !wordsToAvoid.contains($0) }).last return CardDetails(numberWithDelimiters: creditCardNumber, name: name, expiryDate: expiryDate)}

You can have your own custom parsing logic here, but I kept it simple.

  • If a number has more than 13 digits, and the first digit contains 4, 5, 3, or 6, then it is likely a credit card number.
  • If there is a 5+ character long string that’s delimited by a /, then it is most likely a card expiry date
  • We also want to maintain a list of words to ignore so we can definitively isolate the name of the credit card holder (words like World Elite, or Platinum, Gold, etc..)
  • Return a constructed CardDetails struct with all the values we require!

Now we can go ahead and add this in our SceneDelegate without a main story board (also remember to change main interface in target and info plist)

func scene(_ scene: UIScene, willConnectTo session: UISceneSession, options connectionOptions: UIScene.ConnectionOptions) {
guard let windowScene = (scene as? UIWindowScene) else { return }
let window = UIWindow(windowScene: windowScene)
window.rootViewController = UIHostingController(rootView: CardReaderView(completion: { cardDetails in print(cardDetails) }))
self.window = window
window.makeKeyAndVisible()
}

Let’s go ahead and scan a fake credit card generated from https://herramientas-online.com/credit-card-generator-with-name.php

Fake Credit Card

Now this should return a CardDetails struct with the exact values we are looking for!

I’ve taken this further ahead, and actually implemented a full Credit Card scanning application and framework with some better parsing techniques and a beautiful form to generate, fill with scanning, and manually correct values. Below is the scanning results of a fake generated credit card.

The full project and framework is available with SPM and has an example application to run. You can find it here: https://github.com/khalid-asad/card-reader-ios

This is amazing. And I look forward to what we can do when we push the limits of Vision and VisionKit!

--

--

Khalid Asad

I'm an iOS Developer that loves to tinker and create projects that can make a difference. I'm multi-lingual and work on Swift, Node.js and other languages!