Inference on a server

In this approach, once the model is trained, we host the model on a server to utilize it from the application.

The model can be hosted either in a cloud machine or on a local server, or it can be that of a hosted machine learning provider. The server is going to publish the endpoint URL, which needs to be accessed to utilize it to make the required predictions. The required dataset is to be passed as input to the service.

Doing the inference on a server makes the mobile application simple. The model can be improved periodically, without having to redeploy the mobile client application. New features can be added into the model easily. There is no requirement to upgrade the mobile application for any model changes.

The benefits of using this approach are as follows:

Mobile application becomes relatively simple.
The model can be updated at any time without the redeployment of the client application.
It is easy to support multiple OS platforms without writing the complex inference logic in an OS-specific platform. Everything is done in the backend.

What we need to be careful about when we go for this approach is the following:

The application can work only in online mode. The application has to connect to backend components in order to carry out the inference logic.
There is a requirement to maintain the server hardware and software and ensure it is up and running. It needs to scale for users. For scalability, the additional cost is required to manage multiple servers and ensure they are up and running always.
Users need to transmit the data to the backend for inference. If the data is huge, they might experience performance issues as well as users needing to pay for transmitting the data.