Intro
In the previous article 1/5 iOS Machine Learning Architecture & Tools all the needed prerequisites were given so that any iOS developer using Swift language could feel comfortable with the content of this article. If you did not read it yet - I encourage you to do so.
In this article, four native domain-specific frameworks powered by Machine Learning models under the hood and used in the intelligent iOS applications development will be presented more in-depth. Frameworks are easy to use and in fact, one might have already used them without thinking about the complexity of Machine Learning.
Once the capabilities of the frameworks are presented and clear, we will see how to use them practically. Readers will be provided with a source code of a demo iOS application written with Swift and SwiftUI. This application demonstrates the examples of implementing intelligent features by using previously presented domain-specific Machine Learning frameworks.
iOS domain-specific frameworks powered by Machine Learning
Let’s go through the introduction of each framework and see if it could boost the intelligence of the beloved apps that we are working on.
-
Vision
As the name of the framework suggests - its purpose is to perform a variety of tasks on input images and video. Some of the ideas of what Vision is capable of:
- Barcode Detection
- Image Classification, Saliency & Alignment
- Image Similarity
- Object Detection
- Moving Object Tracking in Video
- Trajectory Detection of Moving Object in Video
- Contour Detection
- Text Detection & Recognition
- Face Detection & Tracking
- Face Landmarks
- Human Body Detection
- Body Pose
- Hand Pose
- Animal Recognition
And that is not it. If you are interested in using Vision - there is no better place to start than the official Vision documentation.
-
Natural Language
Whenever you work with text and want to introduce some intelligence - the Natural Language framework should be your first choice. Sometimes it can provide amazing results with very little effort. Natural Language main features:
- Language detection
- Sentiment Analysis
- Word & Sentence Embedding
- Tokenization
Automatically detecting and preselecting language from a user’s text input is small but nice detail and things like that are super simple to implement using the Natural Language framework. Whether a goal is simple or more complex - documentation always helps.
-
Speech
Thinking about implementing speech recognition into your iOS app? Speech framework is all you need to automatically generate transcripts, alternative interpretations, and confidence levels of the results from live or prerecorded audio. Combine Speech capabilities with Natural Language and the sky is the limit of what can be achieved. With the help of Documentation of the Speech framework speech recognition in the iOS app will be implemented in no time.
-
Sound Analysis
Sound Analysis framework is dedicated to helping you analyze and classify sound. The main power of the Sound Analysis framework is utilized when you already have a custom Core ML sound classification model. Such models could be downloaded or you can train them yourself using CreateML or other technologies.
Do you feel amazed when an auto mechanic diagnoses the problem in your car just by listening and recognizing the specific noise coming from the engine? Well, now we have tools to make an app that is capable of doing that.
Demo iOS Notes application with Machine Learning features
To demonstrate how easy it is to implement Machine Learning powered features in the iOS app I have created a demo notes application. The application has three Machine learning-powered features:
- Recognizing the text in the input image
- Recognizing the speech and extracting the subscript from the audio file
- Showing the language of the input text
Let’s go through the implementation of each feature. You’re welcome to download the project from the Github repo. to make it easier to follow the code.
How to detect and recognize text in images with Swift and Vision in iOS applications
For detecting and recognizing text in images we will need to use the Vision
framework.
First, we want to create a text recognition request VNRecognizeTextRequest
.
private var textRecognitionRequest: VNRecognizeTextRequest {
// 1. Creating recognize request
return VNRecognizeTextRequest(completionHandler: { [weak self] (request, error) in
if let recognizedText = request.results as? [VNRecognizedTextObservation] {
// 2. If the request provides a result - take the top candidate from the returned Strings.
let transcript = recognizedText.reduce("") { result, observation in
guard let candidate = observation.topCandidates(1).first?.string else { return "" }
return result.appending(candidate) + "\n"
}
DispatchQueue.main.async { [weak self] in
// 3. Inform the delegate about successfull text recognitioon
self?.delegate?.didRecognizeTextFromImage(transcript)
}
}
})
}
Once we have VNRecognizeTextRequest
, the only missing part is creating VNImageRequestHandler
which processes the previously created request.
func recognizeText(in cgImage: CGImage) {
// 1. Sending the request off the main thread
DispatchQueue.global(qos: .userInitiated).async { [weak self] in
guard let self = self else { return }
// 2. Creating object that processes image analysis requests
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
// 3. Performing the previously created request
try handler.perform([self.textRecognitionRequest])
} catch {
self.delegate?.didFailRecognizeTextFromImage()
}
}
}
That’s it! The full implementation of TextRecognitionService
is in the Github repo.
How to detect speech and extract text from the audio files with Swift and Speech in iOS applications
func recognizeText(fromAudioFileWith url: URL) {
// 1. Creating speech recognizer object
guard let speechRecognizer = SFSpeechRecognizer(), speechRecognizer.isAvailable else {
delegate?.didFailRecognizeFromAudio()
return
}
// 2. Creating and executing sppech recognition request
let request = SFSpeechURLRecognitionRequest(url: url)
speechRecognizer.recognitionTask(with: request) { [weak self] (result, error) in
guard let result = result else {
self?.delegate?.didFailRecognizeFromAudio()
return
}
// 3. Handling speech recognition results
if result.isFinal {
self?.delegate?.didRecogniceTextFromAudio(result.bestTranscription.formattedString)
}
}
}
This is all the code needed to implement the feature of detecting and extracting the speech from audio files using the Speech framework. The full implementation of SpeechRecognitionService
is in the Github repo.
How to identify the language of the text with Swift and Natural Language in iOS applications
In the gif images provided above, you might have noticed that after successful text recognition - the language of the text is identified. This functionality is implemented using the Natural Language framework.
func detectedLanguage(for string: String) -> String? {
guard let languageCode = NLLanguageRecognizer.dominantLanguage(for: string)?.rawValue else {
return nil
}
return Locale.current.localizedString(forIdentifier: languageCode)
}
The whole implementation of this feature is only a couple of Swift code lines, isn’t it amazing?
Conclusion
Implementation of Machine Learning powered features using domain-specific frameworks in iOS applications can be super simple. Give a shot, spin up playgrounds in the Xcode and try it yourself. Share your questions and results in the comments!
Comments