This post is my notes collected while watching Apple’s WWDC 2017 Session 703, “Introducing Core ML” and Session 710, “Core ML in Depth.”

Introducing CoreML

Apple Uses

  • Photos: People & Scence Recognition
  • Keyboard: Next word prediction, smart responses
  • Watch: Smart responses & handwriting recognition


  • Real Time Image recognition
  • Text prediction
  • Entity recognition
  • Handwriting recognition
  • Style transfer
  • Sentiment analysis
  • Search ranking
  • Machine translation
  • Image captioning
  • Personalization
  • Face detection
  • Emotion detection
  • Speaker identification
  • Music tagging
  • Text summarization

Whoa, that’s lot of examples.


Recognizing a rose:

  1. Start by color
  2. Try shape
  3. … It gets complicated fast

Rather than describing how a rose looks like programatically, we will describe rose emperically.

– Gaurav Kapoor

Two Steps to Machine Learning

  1. Training
  2. Inference

Training a Model

  1. Collect sample data (“roses, sunflowers, lillies”)
  2. Pass through a learning algorithm
  3. Generate a model


  1. Pass an image into the model
  2. Get back a result and confidence level.


  • Prove correctness
  • Performance
  • Energy Efficiency

Three frameworks: Vision & NLP sit on top of CoreML.

CoreML is built on top of Accelerate and MPS.

  • Domain agnostic
  • Inputs: Images, text, dictionaries, raw number.
  • Accelerate is good for math functionality.

Advantages to Running Locally

  • Privacy
  • Data Cost
  • No server cost
  • Always available

Real Time Image Recognition

  • No latency


  • Xcode integrations


A model is a function that “happens to be learned from data.” Each takes an input and gives a an output.

Neural Network Types:

  • Feed Forward Neural Networks (image/video)
  • Convolutional Neural Networks
  • Recurrent Neural Networks (text based applications)
  • Tree Ensembles
  • Support Vector Machines
  • Generalized Linear Models

Focus on the use-case and let CoreML handle the details. Models are single documents.

  • Inputs, types, outputs
  • Structure of neural network
  • training parameters

Where do models come from?

  • has some ready-to-use models.
  • The machine learning community: – Caffe – Keras – dmlc XGBoost – scikit learn – turi – libsvm

For converting data to CoreML format, use Apple Core ML Tools Python package.

Development Flow

  • Collect data
  • Train model
  • Drag model into Xcode.

Xcode shows name, filesize, author, license, inputs and outputs. It also generates Swift code asynchronously, for loading and predicting against the model.

  • Model sizes: How does compression work?

  • Type of file is abstracted.
  • Strongly typed inputs.

Generated Source

  • Input, output, and classifier classes.
  • Offers access to the underlying MLModel for programatic access.
  • MLModel has an MLModelDescription and another conformance-based (?) prediction method.
  • MLModel is JSON based.

Core ML Depth

  • CoreML provides a functional abstraction for machine learning models.

Types of CoreML Inputs

  • Numeric: Double, Int64
  • Categories: String, Int64
  • Images: CVPixelBuffer
  • Arrays: MLMultiArray (New type - why?)
  • Dictionaries: [String: Double], [Int64: Double]

Working With Text

Sentiment analysis example: Takes text and passes to the model and the model returns an emoji (happy/ok/sad.)

  • Approach: Operates as word counts.
  • NSLinguisticTagger to tokenize and count words

  • A “Pipeline Classifier” does a few things before returning a prediction. Takes a dictionary, returns a sentiment label, and sentiment scores between 0 and 1.