Welcome to Professional A2DGC Business

011-43061583
info@a2dgc.com

GPT-40 Mini

03

Aug

GPT-40 Mini

Aug 03, 2024

Blog Credit : Trupti Thakur

Image Courtesy : Google

GPT-40 Mini

OpenAI has released  GPT-4o mini, a new  AI model that is more accessible to a wider range of people and costs less. There is a lot of competition in the AI area right now, especially from big names like Google and Meta.

What is GPT-4o mini?

The GPT-4o mini is a smaller and less expensive form of OpenAI’s other language models. It costs 15 cents per million input tokens and 60 cents per million exit tokens, which is more than 60% less than GPT-3.5 Turbo, which came before it.

Performance Metrics

OpenAI says that  GPT-4o mini does better than  GPT-4 in chat tastes and has an 82% score on the MMLU benchmark for Massive Multitask Language Understanding. MMLU tests how well a model can learn and use language in a variety of situations. On the other hand, Google’s Gemini Flash gets 77.9% and Anthropic’s Claude Haiku gets 73.8%. This makes the  GPT-4o mini very competitive when it comes to its ability to understand words.

Availability

The mini form needs less computer power, so it can work for companies that don’t have a lot of resources. Because it is so efficient, more businesses can use creative  AI in their work. ChatGPT users on the Free, Plus, and Team tiers can access  GPT-4o mini instead of GPT-3.5 Turbo right away. Next week, business users will be able to do the same. The model is based on what we know now and until October 2023.

More About OpenAI

OpenAI began as a non-profit group when it was founded in December 2015. The OpenAI Gym, a set of tools for making reinforcement learning algorithms, was its first big release. In 2019, OpenAI’s GPT-2 model caused a stir because it could make text that was not what it seemed to be. OpenAI put out rules for the right way to use its  AI systems in 2020. The name of the business shows that it wants to encourage open collaboration in  AI research, but it changed to a capped-profit model to get the money it needs to fund its work.

What is OpenAI’s GPT-4o?

GPT-4o is OpenAI’s latest LLM. The ‘o’ in GPT-4o stands for “omni”—Latin for “every”—referring to the fact that this new model can accept prompts that are a mixture of text, audio, images, and video. Previously, the ChatGPT interface used separate models for different content types.

For example, when speaking to ChatGPT via Voice Mode, your speech would be converted to text using Whisper, a text response would be generated using GPT-4 Turbo, and that text response would be converted to speech with TTS.

What is GPT-4o mini?

On July 18, 2024, OpenAI announced GPT-4o mini, a cost-efficient small model aimed at broadening AI accessibility. This new, smaller model apparently outperforms GPT-4 on chat preferences in the LMSYS Chatbot Arena leaderboard.

GPT-4o mini is priced at 15 cents per million input tokens and 60 cents per million output tokens, significantly reducing costs compared to previous models. OpenAI claims the model excels in reasoning, math, coding, and multimodal tasks, outperforming other small models. It supports text and vision inputs, with future plans for image, video, and audio. Built-in safety measures ensure responsible use, and the model is available in the Assistants API, Chat Completions API, and Batch API.

What Makes GPT-4o Different to GPT-4 Turbo?

The all-in-one model approach means that GPT-4o overcomes several limitations of the previous voice interaction capabilities.

Tone of voice is now considered, facilitating emotional responses

Lower latency enables real-time conversations

Integrated vision enables descriptions of a camera feed

Better tokenization for non-Roman alphabets provides greater speed and value for money

Rollout to the free plan

Launch of the ChatGPT desktop appHow Does GPT-4o Work?

Many content types, one neural network

Details of how GPT-4o works are still scant. The only detail that OpenAI provided in its announcement is that GPT-4o is a single neural network that was trained on text, vision, and audio input.

This new approach differs from the previous technique of having separate models trained on different data types.

However, GPT-4o isn’t the first model to take a multi-modal approach. In 2022, TenCent Lab created SkillNet, a model that combined LLM transformer features with computer vision techniques to improve the ability to recognize Chinese characters.

In 2023, a team from ETH Zurich, MIT, and Stanford University created WhisBERT, a variation on the BERT series of large language models. While not the first, GPT-4o is considerably more ambitious and powerful than either of these earlier attempts.

Is GPT-4o a radical change from GPT-4 Turbo?

How radical the changes are to GPT-4o’s architecture compared to GPT-4 Turbo depends on whether you ask OpenAI’s engineering or marketing teams. In April, a bot named “im-also-a-good-gpt2-chatbot” appeared on LMSYS’s Chatbot Arena, a leaderboard for the best generative AIs. That mysterious AI has now been revealed to be GPT-4o.

The “gpt2” part of the name is important. Not to be confused with GPT-2, a predecessor of GPT-3.5 and GPT-4, the “2” suffix was widely regarded to mean a completely new architecture for the GPT series of models.

Evidently, someone in OpenAI’s research or engineering team thinks that combining text, vision, and audio content types into a single model is a big enough change to warrant the first version number bump in six years.

On the other hand, the marketing team has opted for a relatively modest naming change, continuing the “GPT-4” convention.

GPT-4o Performance vs Other Models

OpenAI released benchmark figures of GPT-4o compared to several other high-end models.

GPT-4 Turbo

GPT-4 (initial release)

Claude 3 Opus

Gemini Pro 1.5

Gemini Ultra 1.0

Llama 3 400B

Of these, only three models really matter for comparison. GPT 4 Turbo, Claude 3 Opus, and Gemini Pro 1.5 have spent the last few months angling for the top spot on the LMSYS Chatbot Arena leaderboard.

Llama 3 400B may be a contender in the future, but it isn’t finished yet. Thus here, we only present the results for these three models and GPT-4o.

The results of six benchmarks were used.

Massive Multitask Language Understanding(MMLU). Tasks on elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem-solving ability.

Graduate-Level Google-Proof Q&A(GPQA). Multiple-choice questions written by domain experts in biology, physics, and chemistry. The questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 74% accuracy.

MATH. Middle school and high school mathematics problems.

HumanEval. A test of the functional correctness of computer code, used for checking code generation.

Multilingual Grade School Math(MSGM). Grade school mathematics problems, translated into ten languages, including underrepresented languages like Bengali and Swahili.

Discrete Reasoning Over Paragraphs(DROP). Questions that require understanding complete paragraphs. For example, by adding, counting, or sorting values spread across multiple sentences.

GPT-4o gets the top score in four of the benchmarks, though it is beaten by Claude 3 Opus in the MSGM benchmark and by GPT-4 Turbo in the DROP benchmark. Overall, this performance is impressive, and it shows promise for the new approach of multimodal training.

If you look closely at the GPT-4o numbers compared to GPT-4 Turbo, you’ll see that the performance increases are only a few percentage points.

It’s an impressive boost for one year later, but it’s far from the dramatic jumps in performance from GPT-1 to GPT-2 or GPT-2 to GPT-3.

Being 10% better at reasoning about text year-on-year is likely to be the new normal. The low-hanging fruit has been picked, and it’s just difficult to continue with big leaps in text reasoning.

On the other hand, what these LLM benchmarks don’t capture is AI’s performance on multi-modal problems. The concept is so new that we don’t have any good ways of measuring how good a model is across text, audio and vision.

Overall, GPT-4o’s performance is impressive, and it shows promise for the new approach of multimodal training.

What Are GPT-4o Use-Cases?

GPT-40 for data analysis And coding tasks

GPT-40 For Real-time translation

Roleplay with GPT-40

GPT-40 for assisting visually impaired users

Hands-On With GPT-4o

I’ve had access to some of GPT-4o’s new features since just after the announcement (sadly, no voice chat yet), and I’ve been impressed with many of its outputs. Responses seem faster and more consistent, and it seems to understand my requests better than it did previously. That’s not to say it’s been perfect, though.

Here are some examples of the interactions I had with ChatGPT-4o:

Data Analysis Task

Image Analysis

Image Creation

GPT-4o Limitations & Risks

Regulation for generative AI is still in its early stages; the EU AI Act is the only notable legal framework in place so far. That means that companies creating AI need to make some of their own decisions about what constitutes safe AI.

OpenAI has a preparedness framework that it uses to determine whether or not a new model is fit to release to the public.

The framework tests four areas of concern.

Cybersecurity. Can AI increase the productivity of cybercriminals and help create exploits?

BCRN. Can the AI assist experts in creating biological, chemical, radiological, or nuclear threats?

Persuasion. Can the AI create (potentially interactive) content that persuades people to change their beliefs?

Model autonomy. Can the AI act as an agent, performing actions with other software?

Each area of concern is graded Low, Medium, High, or Critical, and the model’s score is the highest of the grades across the four categories.

OpenAI promises not to release a model that is of critical concern, though this is a relatively low safety bar: under its definitions, a critical concern corresponds to something that would upend human civilization. GPT-4o comfortably avoids this, scoring Medium concern.

Imperfect output

As with all generative AIs, the model doesn’t always behave as intended. Computer vision is not perfect, and so interpretations of an image or video are not guaranteed to work.

Likewise, transcriptions of speech are rarely 100% correct, particularly if the speaker has a strong accent or technical words are used.

OpenAI provided a video of some outtakes where GPT-4o did not work as intended.

Notably, translation between two non-English languages was one of the cases where it failed. Other problems included unsuitable tone of voice (being condescending) and speaking the wrong language.

Accelerated risk of audio deepfakes

The OpenAI announcement notes that “We recognize that GPT-4o’s audio modalities present a variety of novel risks.” In a lot of ways, GPT-4o can accelerate the rise of deepfake scam calls, where AI impersonates celebrities, politicians, and people’s friends and family. This is a problem that will only get worse before it is fixed, and GPT-4o has the power to make deepfake scam calls even more convincing.

To mitigate this risk, audio output is only available in a selection of preset voices.

Presumably, technically minded scammers can use GPT-4o to generate text output and then use their own text-to-speech model, though it’s unclear if that would still gain the latency and tone-of-voice benefits that GPT-4o provides.

GPT-4o Release Date

As of July 19, 2024, many features of GPT-4o have been gradually rolled out. The text and image capabilities are added for many users on the Plus and free plans. This includes ChatGPT accessed on mobile browsers. Likewise, the text and vision features of GPT-4o are already available via the API.

These features of GPT-4o are broadly available on iOS and Android mobile apps. However, we’re still awaiting the new Voice Mode, which will be updated to use GPT-4o, the API will add audio and video capabilities for GPT-4o, and the new model will be available on Mac Desktop. Access to the latter is also gradually being rolled out to Plus users, and a Windows desktop application is planned for later this year.

Below is a summary of the GPT-4o release dates:

Announcement of GPT-4o: May 13, 2024

GPT-4o text and image capabilities rollout: Starting May 13, 2024

GPT-4o availability in free tier and Plus users: Starting May 13, 2024

API access for GPT-4o (text and vision): Starting May 13, 2024

GPT-4o availability on Mac desktop for Plus users: Coming weeks (starting May 13, 2024)

New version of Voice Mode with GPT-4o in alpha: Coming weeks/months (after May 13, 2024)

API support for audio and video capabilities: Coming weeks/months (after May 13, 2024)

GPT-4o mini: July 18, 2024

However, after the controversy caused by the demo of the new voice capabilities, it seems OpenAI is being cautious about the release. According to their updated blog, ‘Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.’

How Much Does GPT-4o Cost?

Despite being faster than GPT-4 Turbo with better vision capabilities, GPT-4o will be around 50% cheaper than its predecessor. According to the OpenAI website, using the model will cost $5 per million tokens for input and $15 per million tokens for output.

How Can I Access GPT-4o in the Web Version of ChatGPT?

The user interface for ChatGPT has changed. All messages in ChatGPT default to using GPT-4o, and the model can be changed to GPT-3.5 using a toggle underneath the response.

What Does GPT-4o Mean for the Future?

There are two schools of thought about where AI should head towards. One is that AI should get ever more powerful and be able to accomplish a wider range of tasks. The other is that AI should get better at solving specific tasks as cheaply as possible.

OpenAI’s mission to create artificial general intelligence (AGI), as well as its business model, put it firmly in the former camp. GPT-4o is another step towards that goal of ever more powerful AI.

This is the first generation of a completely new model architecture for OpenAI. That means that there is a lot for the company to learn and optimize over the coming months.

In the short term, expect new types of quirks and hallucinations, and in the long term, expect performance improvements, both in terms of speed and quality of output.

The timing of GPT-4o is interesting. Just as the tech giants have realized that Siri, Alexa, and Google Assistant aren’t quite the money-making tools they once hoped for, OpenAI is hoping to make AI talkative again. In the best case, this will bring a raft of new use cases for generative AI. At the very least, you can now set a timer in whatever language you like.

Conclusion

GPT-4o represents further progress in generative AI, combining text, audio, and visual processing into one efficient model. This innovation promises faster responses, richer interactions, and a wider range of applications, from real-time translation to enhanced data analysis and improved accessibility for the visually impaired.

While there are initial limitations and risks, such as potential misuse in deepfake scams and the need for further optimization, GPT-4o is another step towards OpenAI’s goal of artificial general intelligence. As it becomes more accessible, GPT-4o could change how we interact with AI, integrating into daily and professional tasks.

With its lower cost and enhanced capabilities, GPT-4o is poised to set a new standard in the AI industry, expanding the possibilities for users across various fields.

The future of AI is exciting, and now is as good a time as any to start learning how this technology works. If you’re new to the field, get started with our AI Fundamentals skill track, which covers actionable knowledge on topics like ChatGPT, large language models, generative AI, and more. You can also learn more about working with the OpenAI API in our hands-on course.

Blog By : Trupti Thakur

Recent Blog

The BioMetric E-PassportsMay 19, 2025

AI HallucinationsMay 16, 2025

India’s Steps Into 6GMay 15, 2025

The New Accessibility Feature of AppleMay 14, 2025

The Digital Threat Report 2024May 13, 2025