Monday 25 September 2023

OpenAI Enhances ChatGPT to Enable Visual, Auditory, and Verbal Capabilities

OpenAI Upgrades ChatGPT with Voice and Image Capabilities

OpenAI is making significant updates to its viral chatbot, ChatGPT, allowing the AI tool to “see, hear, and speak.” With these latest upgrades, users can engage in back-and-forth conversations with the chatbot and even receive responses to image prompts. These improvements bring ChatGPT closer to the capabilities offered by popular virtual assistants like Siri, Google Lens, and Amazon’s Alexa.

Enhanced User Experience with Voice and Image

The introduction of voice and image capabilities provides users with additional ways to interact and use ChatGPT in their daily lives. For example, when traveling, users can snap a picture of a landmark and have a live conversation with ChatGPT about its interesting features. Similarly, users can take pictures of their fridge and pantry to determine what to cook for dinner, asking follow-up questions for step-by-step recipes. ChatGPT can even provide hints for solving math problems when a user shares a photo of the problem set.

Human-like Audio with Text-to-Speech Model

To enable voice interactions, OpenAI has integrated a sophisticated text-to-speech model into ChatGPT. This model generates human-like audio from text inputs and a short voice sample. The company enlisted professional voice actors to create the diverse voices available in ChatGPT. Moreover, OpenAI utilizes Whisper, an open-source speech recognition system, to transcribe spoken words into text.

Potential Risks and Mitigations

While the new voice capabilities bring exciting possibilities, OpenAI acknowledges potential risks, such as fraud and impersonation. The ability to craft realistic synthetic voices from a few seconds of real speech opens doors for various creative and accessibility-focused applications. However, malicious actors could exploit these capabilities by impersonating public figures or engaging in fraudulent activities. OpenAI acknowledges the responsibility to address these concerns and protect individuals’ privacy.

OpenAI also recognizes the challenges associated with vision-based models. Measures have been taken to limit ChatGPT’s ability to analyze and make direct statements about individuals, as the system is not always accurate. Maintaining privacy and ensuring the model’s proper interpretation of images in high-stakes domains proves to be an ongoing challenge.

User Testing and Availability

OpenAI has rigorously tested the model with “red teamers” to assess risks in domains like extremism and scientific proficiency. They also engaged a diverse set of alpha testers for feedback. OpenAI plans to release the voice and image capabilities to users of the Plus and Enterprise versions of ChatGPT within the next two weeks.

With these updates, ChatGPT aims to provide an enhanced conversational experience that incorporates voice and image interactions, opening up new possibilities for users in various domains.

Editor Notes

OpenAI’s continuous efforts to improve ChatGPT by adding voice and image capabilities are commendable. These upgrades expand the practical application of AI in daily life, making it more accessible and useful for users. However, it is crucial to stay vigilant about the potential risks associated with these advancements, such as impersonation and privacy concerns. OpenAI’s commitment to mitigating these risks demonstrates responsible AI development. As ChatGPT evolves, we can expect even more innovative features to benefit users across different industries and sectors.

For more information and updates on OpenAI’s advancements, visit GPT News Room.

Source link



from GPT News Room https://ift.tt/fwPCFWI

No comments:

Post a Comment

語言AI模型自稱為中國國籍,中研院成立風險研究小組對其進行審查【熱門話題】-20231012

Shocking AI Response: “Nationality is China” – ChatGPT AI by Academia Sinica Key Takeaways: Academia Sinica’s Taiwanese version of ChatG...