GPT-4 was just announced, here's what you can expect...
OpenAI has just released GPT-4, a new version of its groundbreaking language model that can also "see" images ! That's right, folks, this AI can not only write like a human, but also reason based on visual inputs. How cool is that? In this post, I will give you a brief overview of what GPT-4 is, what it can do, and why it matters for the future of AI. So buckle up and get ready for a wild ride!
-The AI Frontier
If you have been following the progress of artificial intelligence (AI) in recent years, you might have heard of GPT-3.5, the powerful language model that can generate realistic text on almost any topic, given a few words or sentences as input. GPT-3.5 was created by OpenAI, a research organization backed by Microsoft and other tech giants, and it powered ChatGPT, a popular chatbot that can converse with humans on various topics.
But GPT-3.5 is not the end of the story. OpenAI has just announced GPT-4, the latest milestone in its effort to scale up deep learning. GPT-4 is a large multimodal model that can accept both image and text inputs, and emit text outputs. This means that it can not only write like a human, but also understand and describe images like a human.
GPT-4 is also much more capable than GPT-3.5 in many ways. It exhibits human-level performance on various professional and academic benchmarks, such as passing a simulated bar exam with a score around the top 10% of test takers , or solving math problems from Olympiads and AP exams. It is also more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. Here are some examples from Openai’s blogpost and website:
We can see a very clear trend, GPT-4 seems to be a significant upgrade, in terms of word limit, and its capabilities. It makes GPT-3 look like a joke in tasks like AP calculus and The Uniform Bar Exam.
It also does way better on all the most common categories on which GPT-3.5 was used for as you can see below:
The model also does much better in different languages:
How did OpenAI achieve this remarkable feat? According to its blog post, it took several steps to improve its deep learning stack and co-design a supercomputer with Azure for its workload. It also spent six months iteratively aligning GPT-4 using lessons from its adversarial testing program and ChatGPT, resulting in better results on factuality, steerability, and refusing to go outside of guardrails. GPT-4 is also trained on a lot more parameters, which is a big reason why it does so much better that GPT-3.5
GPT-4's multimodal ability allows it to analyze both text and images simultaneously and generate text outputs based on them. For example, it can describe what is happening in a picture or answer questions about it. We think it would able to recognize objects in a picture, remove or replace them and much more. Kind of like DALLE 2 or Midjourney, but in a chat interface. This ability makes GPT-4 more versatile and capable of solving more complex problems that require visual understanding. GPT-4's multimodal ability is still in the research preview stage, but we can see early examples of the model in this video from OpenAi and some screenshots which are available on their website:
(An actual example!)
OpenAI is also releasing GPT-4's text input capability via ChatGPT Plus, however they do mention that the amount of tokens you can use with the model, will be constrained as they try to scale their systems to accustom more users. They are also releasing the GPT-4 API (with a waitlist, of course). The image input capability is still in preview mode and will be available to a single partner at first. OpenAI is also open-sourcing OpenAI Evals, its framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in its models and help guide further improvements.
However, GPT-4 also raises some serious challenges and concerns for the future of AI and multimodality. One of them is the potential for misuse and abuse of such powerful technology by malicious actors or unintended consequences. OpenAI has acknowledged that GPT-4 may have social biases, hallucinations, and safety issues that need to be addressed with more regulation and alignment training . Another challenge is the competition and innovation pressure that GPT-4 creates for other AI players, such as Google, which released its own generative model called Bard in February 2023. How will these rival models coexist and collaborate in the AI ecosystem? This could, however be seen as a win-win for the user, as they end up getting the best tech.
Despite these challenges, GPT-4 also offers a hopeful future for AI and multimodality. It demonstrates the remarkable progress that has been made in deep learning research and applications in recent years. It opens up new possibilities for enhancing human capabilities, creativity, and productivity across various domains. It also invites more dialogue and collaboration among researchers, developers, users, regulators, and society at large to ensure that AI is used responsibly and ethically for the benefit of humanity.