Training a computer vision AI model to detect voting ballots using only synthetic data is possible Could it lead to more reliable and secure voting?

It’s November 2020, and America is voting.

With mail-in-voting likely to reach historic numbers there has been more scrutiny on our voting systems. The source of these concerns centers around the accuracy and speed of the current state-of-the-art voting methods. This leads to voter distrust in voting technology: even if you go and vote, is your vote being counted correctly?

We decided to tackle the problems presented by optical-scan ballots. Optical-scan paper ballot systems are still widely used to tabulate votes. Unfortunately, as things stand now, optical-scan ballots have been notoriously unreliable [1]. When one of these machines fails, manual labor is used to verify the results (i.e., someone looks at it).

The Experiment

Our idea to get around the problems presented: a simple CV (computer vision) model to count optical-scan ballots. At Zumo Labs we are creating synthetic data to train the next generation of CV models and as such we are always on the lookout for cool use-cases for CV models. In an ideal world, a human would check every ballot. But that’s not really feasible. Visual computing systems have historically been called upon to get technology to human-like accuracy.

As engineers, we believe the more fail-safes in a system, the better the system design. This system could be used alongside the current systems or even replace them down the line.

Counting optical-scan ballots is a good use-case. A model should not be affected by ablations, or optical-scanner calibration issues. More importantly: the model we trained can be deployed on any device with a RGB camera and therefore would not require costly custom hardware.

Reliable, secure voting is possible and here is how we did it.

Election Time in Bikini Bottom

For this experiment we simulated an election in Bikini Bottom, where Squidward Tentacles took on the incumbent Patrick Star.

For the model we decided upon using a Mask R-CNN with a ResNet-FPN backbone. This model configuration comes from Detectron2 which is made available by Facebook Research and provides sample configurations for popular architectures [2]. We trained this model from scratch to perform instance segmentation and bounding box detection of two categories, Patrick and Squidward. We trained this model fully on synthetic data with bounding boxes and segmentation masks for each filled-in ballot.

Figure 1: Our synthetically-trained model’s detection on real images.

The Data

The data set we used was roughly split up with half being votes for Squidward and half being votes for Patrick. 

Figure 2: (Left) real image, (Center) prediction, (Right) synthetic image

You might notice the varied visual appearance of the background: this is a technique called domain randomization. Domain randomization allows the model to ignore the background to focus on what matters (i.e., the ballot).

Our data was generated using a combination of the popular Blender software and some internal tools which we hope to open source in the coming future (keep an eye out!). This data set is available on our website for free at

Training the Model: Edge Cases

Let’s return to our undersea hamlet, the testing grounds of our experiment. After the election in Bikini Bottom begins, the voting committee comes to realize that the voters have not understood how they are supposed to be voting. The uninformed voters have not understood that they are supposed to be filling in the bubbles and have decided to circle the boxes on the ballot.

Back to the lab! Our current model (the one trained above) was unable to detect these circles since it was only trained on filled boxes. Easy and quick fix: we are able to generate a new batch of synthetic data which covers this case in record time.

This is one of the great advantages of synthetic data: once an edge case is detected, it is simple to generate a data set and train the model to solve the edge case. Below you can see the improved model is able to detect the incorrect format, showing the power of synthetic data in helping tackle problems you haven’t run into yet.

Figure 3: Our model trained on the added synthetic data now recognizes the voter’s choices

Voting Issues: Privacy

Voters want their votes to be secret and secure. We live in a world where people are more conscientious about their privacy and their digital footprint. Many CV systems are trained using previously collected data and often this raises many privacy concerns since the end users and the companies are often not aware of the ramifications and issues.

We trained the above models on 100% synthetic data. This means that no ballots have to be recorded so that a model can train on that collected real data. We believe a CV model can not only make counting ballots a more reliable process, but when the CV model is trained on 100% synthetic data, the CV model can also ensure secret and secure voting.


We hope by now that we have built some trust in how a computer vision model could be used as a component in the voting process. On top of that, we hope to have shown how synthetic data allows us to train these models to adapt to ever changing conditions (capture all the edge cases quickly!) and to toss all privacy concerns out the window.

If you have your own theories, or you would like to test out your own hunches with a model but lack the training data, or if you disagree with our thoughts and conclusions about the voting process, let me know at! Would love to chat.

Thanks for reading and make your voice heard this November by voting! [3]

(By Norman Ponte. Questions, comments, suggestions? Send them his way at: or book a demo today.)


To read this post on Towards Data Science, click HERE.



[1] Voting Technology. MIT Election Lab. (

[2] Detectron2. (

[3] Go Vote! (

Categories: Training


Leave a Reply

Your email address will not be published. Required fields are marked *