Homomorphic Artificial Intelligence (HAI)

Press Release

Presentation The disadvantage of cloud solutions lies in the fact that the processing of data transferred to the cloud can only be carried out if the data is unencrypted, making it possible for an unauthorised third-party to access the data. The objective of the technology we are working on is to enable data analysis and artificial intelligence processing (Machine learning, big data, deep learning) to be carried out directly on the encrypted versions of the data while guaranteeing that the results are identical to those that would be obtained on the unencrypted data.

We have then designed HAI standing for Homomorphic Artificial Intelligence. Now mathematical design and security proof have been completed. PoCs have been finalized. Deep evaluation by data scientists is now completed.

Parameters, Security and Performances

The latest tests of our PoC confirm the mathematical theory and information mentioned above.

We have been able to implement a k-means algorithm directly on encrypted data:

  • Number of individuals 100,000
  • Number of characteristics per individual: 50,000
  • 256-bit encryption key

We compared classification without and with encryption using Rand’s R index. We systematically obtain the value R = 1, which indicates that encryption preserves perfect classification.

Homomorphic data analysis works with 128-, 256-, 392- or 512-bit secret keys.

The security of this encryption technique has been evaluated according several scenarios, under the assumption that the attacker knows the encryption principle (only the secret key is unknown)

  • the attacker has a large number of encrypted data vectors (encrypted with the same or with different keys)
  • The attacker knows the nature of the various characteristics describing an individual.

In all cases, cryptographic security has been proven to be at least equal to current encryption standards.

  • At best, the key can be found by brute force only. The knowledge of plaintext/encrypted pairs does not provide any usable knowledge.
  • Reconstructing one or more plaintext data vectors from their encrypted versions has a double exponential complexity (reidentification).

The GDPR compliance analysis and validation was finalized by our DPO/ethical officer. In particular, he pointed out that another application of this homomorphic data analysis technique is the ultra-fast search of data in an encrypted database or in encrypted texts. The very first tests confirm this possibility for an exact search (same text). It also seems possible to search for similar or nearby is also possible, and will give rise to verification PoCs very soon.

We have extended our technology to all supervised and deep learning algorithms, the main theoretical bottleneck having been successfully overcome. We are also aiming to implement our algorithms on hardware (SoC).

We are also working on the industrialization of this technique and finding industrial collaborations. The scenario envisaged is as follows

  • Implementation in an HSM containing the keys and the encryption logic (the HSM contains strong anti-tampering mechanisms).
  • Deployment of the HSM on behalf of the customer in a secure zone, in two possible use-cases
    • the HSM is on the customer’s premises, and once protected (black traffic) data is transferred to the cloud provider for ML/BD/DL processing (Figure 1)

Figure 1 - the HSM is deployed directly at the cloud provider in a secure zone under customer’s control. The security of this scenario relies on the security and trust of the data encapsulation mechanism (SSH, TLS) used to send data to the HSM (figure 2).

In parallel with this R&D and industrialization work, we are studying the best way to protect the intellectual property of our technology (patentable, copyrightable or managing as trade secrets). The choice we make will determine whether we publish all or part of our technology.

Evaluation for Deep Learning We have just succeeded to train neural networks on data encrypted with HDAS. For sake of validation, we have considered the Fashion MNIST dataset which is a drop-in replacement of the MNIST dataset. More exactly it is a dataset of Zalendo’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

In order to test HAI on Neural Network, we apply the Tensorflow/Keras deep learning technique proposed by Aurélien Géron in his book ``Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow’’ (3rd edition, chapter 10, Section Implementing MLPs with Keras). We did not tune the proposed algorithm with specific parameters yet. We just applied the algorithm on the HDAS-encrypted Fashion-MNIST dataset with the same parameters.

Here are the main results:

  • Both the training and validation steps are successful with performances results almost similar to those on the original unencrypted dataset (accuracy of 0.8854 versus 0.8188). The next step will focus on fine-tuning the Neural Network hyperparameters and there is no doubt that we will fully equal the best performance results known to date.
  • The computing time is the same (no overhead due to data encryption).
  • Learning curves are similar (unencrypted data [left] and HAI encrypted data [right]):
  • Prediction performances are the same.

Mathematical Formalization We have formalized our new approach by means of a new class of secure hash function with new security requirements.
The industrial application of HAI is called HbHAI (Hash-based Homomorphic Artificial Intelligence).

This new class of key-dependent hash functions naturally preserves the similarity properties, most AI algorithms rely on. Among their many features, HbHAI techniques can reduce data size with a compression ratio of at least 3. While strongly preserving data security and confidentiality, this reduces storage space and computation time for native, “ready-to-use” AI algorithms.

This formalization will be presented at the 2025 International Conference of the AI Revolution: Research, Ethics and Society (AIR-RES 2025) in April in Las Vegas. The technical evaluation of HbHAI techniques is about to be presented at CyberWiseCon 2025 in Vilnius. First HbHAI-protected datasets will be made publicly available in June 2025.