Build a Spam Checker with OpenAI and GPT-3 | by Paulo Taylor | November, 2022

A simple tutorial on creating a spam filter using a fine-tuned API

The OpenAI API by default gives you different AI models or engines which are suitable for different cases.

The fine-tuning feature allows to take OpenAI’s models/engines and supply them with new training data and build a new “fine-tune” model
This is the fine-tuning feature we will use to create our own spam checker.

But first, we’ll need some training data to build our fine-tune model.
In Call Assistant we can access data from existing robocalls that our users screened.

We will need examples of telemarketers and robocalls as well as legitimate calls so that the model can better classify text spam either not spam

Here’s an example of a telemarketer trying to sell some sort of credit solution to a user:

Hi, it’s Sarah again with the credit pros. I’ve tried calling you several times, but no luck connecting. We’ve helped millions of people improve their credit, and we’d love to help you too. So call me back on this number as soon as possible. Looking forward to chatting. Thank you.

and here is an example of a valid call

Hi, I am Spencer from the office of the children’s dentist Dr. Porter regarding Mr. Smith. He is going to go on his cleanliness journey next month. Do you like to schedule that appointment? Please? Call us at xxx-xxx-xxx. Once again, our phone number is xxx-xxx-xxx. have a wonderful day. Bye.

We need to add a separator between the sign and the result. For this example, we will use \n\n###\n\n As suggested by OpenAI in their tutorials. Using this data we have to upload a JSONL File with necessarily formatted data. A sample of this is presented. You should add more and more examples.

After compiling the file we need to start the fine tuning process. You’ll need a lot of data, the OpenAI CLI, and an API key

openai api -k sk-YOUR_KEY fine_tunes.create -t file.jsonl -m ada

The fine-tuning may take some time to complete, depending on the model and the amount of training data you have. You can use this command to track progress:

openai api -k sk-YOUR_KEY fine_tunes.follow -i ft-aBcDeFgHiJkLmNoP
...
[18:26:40] Fine-tune enqueued. Queue number: 0
[18:26:40] Fine-tune is in the queue. Queue number: 0
[18:29:14] Fine-tune started
[18:30:52] Completed epoch 1/4
[18:32:11] Completed epoch 2/4
[18:33:30] Completed epoch 3/4
[18:34:49] Completed epoch 4/4
[18:35:08] Uploaded model: ada:ft-x-xxxx-xx-xx
[18:35:09] Uploaded result file: file-aBcDeFgHiJ
[18:35:09] Fine-tune succeeded

Now we have our new model ready to use. In the following example, I am using a similar sentence and the engine will classify the text as spam,

openai api -k sk-YOUR_KEY completions.create -m ada:ft-x-xxxx-xx-xx -M 4 -p "Hello, this is John from Finance Plus. I've called before,  We've helped other individuals like you improve their credit. Please give me a call later.###"

The answer would be something like this:

Hello, this is John from Finance Plus. I've called before,  We've helped other individuals like you improve their credit. Please give me a call later.###spam

If you use java you can try something like this

Performance wise it seems that the Completion API takes around 500-900 milliseconds to execute but from my experience the more you use it, the faster it gets.

Using this approach with AI and GPT-3 we are able to scan messages for spam while we check calls and our call assistant notifies users in real time that they are in the presence of spam calls.

Thanks for reading.

Leave a Comment