Machine learning API

Right now we have 3 database tables:

  • Users

  • Requests

  • Responses

Additionaly, we have created a machine learning model along with the input schema. It is time to create a working API using FastAPI to serve predictions.

Loading the ML model to memory

The most efficient way to load an ML model to memory is to save it during the initiation of the FastAPI application. It is a common mistake to read the model file and the schema file everytime a new request comes in and then apply it.

We should import the model objects and create any additional objects at the top of the main app.py script where the API object is beeing created.

The necessary utilities:

!cat ML_API/machine_learning_utils.py
# Pickle object reading 
import pickle 

# JSON object reading 
import json 

# OS traversal 
import os 

# Input dataframe
import pandas as pd 

# Array math 
import numpy as np 

def load_ml_model(model_dir='ml_model'):
    """
    Loads the model and the schema from the given path
    """
    model, type_dict, feature_list = {}, {}, []
    
    _model_path = os.path.join(model_dir, 'model.pkl')
    _input_schema_path = os.path.join(model_dir, 'input_schema.json')

    # Default empty input schema 
    input_schema = {}

    # Checking if the files exists and reading them 
    if os.path.exists(_model_path) and os.path.exists(_input_schema_path):
        
        with open(_model_path, 'rb') as f:
            model = pickle.load(f)

        with open(_input_schema_path, 'r') as f:
            input_schema = json.load(f)
    
    # Extracting the features
    features = input_schema.get('input_schema', {})
    features = features.get('columns', [])

    # Iterating over the list of dictionaries and changing the types.
    # numeric -> float 
    # boolean -> bool
    # The resulting dictionary will have a key value of the feature name and the value will be the type
    for feature in features:
        if feature.get('type') == 'numeric':
            feature['type'] = float
        elif feature.get('type') == 'boolean':
            feature['type'] = bool
        type_dict.update({feature.get('name'): feature.get('type')})
    
    # Extracting the correct ordering of the features for the ML input 
    feature_list = [x.get('name') for x in features]

    # Returning the model, type dictionary and the feature order
    return model, type_dict, feature_list

def predict(model, feature_dict: dict, X: dict) -> list:
    """
    Function that converts the feature_dict into a predictable format for the ml model

    Args:
        model: the machine learning model
        feature_dict: the dictionary of features
        X: dictionary with (feature -> feature value) pairs

    Returns:
        A list of predictions
    """
    try:
        # Converting the dictionary into a list of lists
        feature_list = list(feature_dict.keys())

        # Converting the dictionary to a dataframe 
        X = pd.DataFrame(X, index=[0])

        # Ensuring that no columns are missing 
        if len(feature_list) != X.shape[1]:
            for col in X.columns:
                if col not in feature_list:
                    X[col] = np.nan

        # Converting the X columns to correct types 
        for col in X.columns:
            if col in feature_list:
                try:
                    X[col] = X[col].astype(feature_dict.get(col))
                except: 
                    print(f"Cannot convert {col} to {feature_dict.get(col)}")
                    # If we cannot convert it, we will set it to null. 
                    return None 

        # Predicting the output
        prediction = model.predict_proba(X[feature_list])[0]

        return prediction
    except:
        return None

The loading of the model occurs right before defining the endpoints:

...

# Creating the application object 
app = FastAPI()

# Loading the machine learning objects to memory 
ml_model, type_dict, ml_feature_list = load_ml_model()

...

By loading the objects in the following way, the objects are saved in runtime memory and are not loaded from disk everytime a new request comes in. This makes the application much faster.

API usage flowchart

A typical flow of the API is the following:

  • Register a user:

registration

The output of the registration logic is a JWT token which we attach in each of the requests to our API.

  • Prediction flow:

api-flow

Each request to the API needs to have the JWT token attached to it. Then, along with the token, the data for the API is sent ant the following flow starts:

  1. The user is beeing authenticated.

  2. If the user is authenticated, then the request data is beeing validated for the ML model.

  3. If the data is good, then the prediction is beeing made.

  4. The final response is sent.

Along the way, the information is logged to the Requests and Responses tables.

All the code is available in the app.py script in the ML_API directory so lets try and apply the above flowchart!

API usage

# Requests making  
import requests 

# Defining the constants for the API
url = 'http://localhost:8001'

Creating a user

# Defining the user dict 
user_dict = {
    "username": "eligijus_bujokas",
    "password": "password",
    "email": "eligijus@testmail.com"
}

# Sending the post request to the running API 
response = requests.post(f"{url}/register-user", json=user_dict)

# Getting the user id 
user_id = response.json().get("user_id")

# Printing the response 
print(f"Response code: {response.status_code}; Response: {response.json()}")
Response code: 409; Response: {'message': 'User already exists', 'user_id': 5}

Getting the token

# Querying the API for the token 
response = requests.post(f"{url}/token", json=user_dict)

# Extracting the token from the response
token = response.json().get("token")

# Printing the response
print(f"Response code: {response.status_code}; JWT token: {token}")
Response code: 200; JWT token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE2NDIzNDc4MDQsImlhdCI6MTY0MjM0NDIwNCwic3ViIjo1fQ.rrQdzFUO9vHEC1nkFBv7cJVvprn6MaXXp1nncNfH2jU

Getting the predictions

We need to first recap what was the input used to train the model. The features were:

{
  "input_schema": {
    "columns": [
      {
        "name": "age",
        "type": "numeric"
      },
      {
        "name": "creatinine_phosphokinase",
        "type": "numeric"
      },
      {
        "name": "ejection_fraction",
        "type": "numeric"
      },
      {
        "name": "platelets",
        "type": "numeric"
      },
      {
        "name": "serum_creatinine",
        "type": "numeric"
      },
      {
        "name": "serum_sodium",
        "type": "numeric"
      },
      {
        "name": "sex",
        "type": "boolean"
      },
      {
        "name": "high_blood_pressure",
        "type": "boolean"
      }
    ]
  }
}

We will use a POST request to get the probabilities because we want to send the features and their values not as a collection of URL parameters but as a JSON object in the request body.

# Creating the input dictionary
X = {
    'age': 25,
    'creatinine_phosphokinase': 1000,
    'ejection_fraction': 35,
    'platelets': 500000,
    'serum_creatinine': 8,
    'serum_sodium': 135,
    'sex': 1,
    'high_blood_pressure': 0
}

# Creating the header with the token 
header = {
    'Authorization': token
}

# Sending the request 
response = requests.post(f"{url}/predict", json=X, headers=header)

# Infering the response
print(f"Response code: {response.status_code}; Response: {response.json()}")
Response code: 200; Response: {'yhat_prob': '0.5124506', 'yhat': '1'}

The response dictionary has two keys:

yhat_prob - probability of a death event

yhat - the predicted class; 1 - death_event, 0 - no_death_event

The function predict_ml from the app.py file handles the request and the whole logic is presented here.

The steps are:

  1. Extract the token

  2. Authenticate it

  3. Extract the inputs

  4. Log the request to database

  5. Make the prediction

  6. Log the response to database

  7. Return the response to the user

Inspecting the results

All the sufficient information for tracking the API is in the Users, Requests and Responses tables. We can inspect them after our run of requests and responses.

# Importing the connection 
import pandas as pd 
from ML_API.database import engine

# There maybe some legacy users beside eligjus_bujokas
users = pd.read_sql('select * from users', engine)
print(f"Users in the database:\n{users}")

# Tail of the requests
requests_data = pd.read_sql('select * from requests', engine)
print(f"--\nLast 5 requests:\n{requests_data.tail(5)}")

# Tail of the responses
response_data = pd.read_sql('select * from responses', engine)
print(f"--\nLast 5 responses:\n{response_data.tail(5)}")
Users in the database:
   id          username                                           password  \
0   4          eligijus  gAAAAABh0ZTKq-0Es9-BzQd-KJ2R8dNY70vFlj3OY_hsC2...   
1   5  eligijus_bujokas  gAAAAABh3uroEwCl80d480inMXMCE1TuHjaBYMgmJI3eKL...   
2  10              test  gAAAAABh4_93mQfHPCQ51tEZ6oLqIZPbADDgIHdNDm2fg9...   

                   email           created_datetime  \
0  eligijus@testmail.com 2022-01-02 14:04:26.751085   
1  eligijus@testmail.com 2022-01-12 16:51:20.534979   
2      test@testmail.com 2022-01-16 13:20:23.508369   

            updated_datetime  enabled  
0 2022-01-02 14:04:26.751085     True  
1 2022-01-12 16:51:20.534979     True  
2 2022-01-16 13:20:23.519665    False  
--
Last 5 requests:
    id  user_id                                              input  \
20  21        5  {"age": 25, "creatinine_phosphokinase": 1000, ...   
21  22        5  {"age": 25, "creatinine_phosphokinase": 1000, ...   
22  23        5  {"age": 25, "creatinine_phosphokinase": 1000, ...   
23  24        5  {"age": 25, "creatinine_phosphokinase": 1000, ...   
24  25        5  {"age": 25, "creatinine_phosphokinase": 1000, ...   

             created_datetime           updated_datetime  
20 2022-01-14 20:37:26.669542 2022-01-14 20:37:26.669542  
21 2022-01-14 20:42:49.979033 2022-01-14 20:42:49.979033  
22 2022-01-16 13:22:03.372336 2022-01-16 13:22:03.372336  
23 2022-01-16 13:26:30.148125 2022-01-16 13:26:30.148125  
24 2022-01-16 14:43:24.945873 2022-01-16 14:43:24.945873  
--
Last 5 responses:
    id  request_id                                   output  \
16  17          21  {"yhat_prob": "0.5124506", "yhat": "1"}   
17  18          22  {"yhat_prob": "0.5124506", "yhat": "1"}   
18  19          23  {"yhat_prob": "0.5124506", "yhat": "1"}   
19  20          24  {"yhat_prob": "0.5124506", "yhat": "1"}   
20  21          25  {"yhat_prob": "0.5124506", "yhat": "1"}   

             created_datetime           updated_datetime  
16 2022-01-14 20:37:26.694771 2022-01-14 20:37:26.694771  
17 2022-01-14 20:42:50.004279 2022-01-14 20:42:50.004279  
18 2022-01-16 13:22:03.390882 2022-01-16 13:22:03.390882  
19 2022-01-16 13:26:30.155778 2022-01-16 13:26:30.155778  
20 2022-01-16 14:43:24.954969 2022-01-16 14:43:24.954969