Technology & AI

New AI Models from Google, OpenAI Emerge

by John Digweed · 2 months ago · 7 mins read · 0 Views

New AI Models from Google, OpenAI Emerge

Google and OpenAI Unveil Advanced AI Models, Shifting Industry Landscape

The artificial intelligence landscape is rapidly evolving, with major players like Google and OpenAI consistently pushing the boundaries of what AI can achieve. This past week has seen significant announcements from both tech giants, including advancements in image generation, multimodal capabilities, and sophisticated reasoning engines. These updates signal a new era of AI development, where models are not only more powerful but also more integrated into complex tasks.

Google Enhances Image Generation and Multimodal Understanding

Google has quietly rolled out significant upgrades to its AI offerings. The latest iteration of its image generation model, Nano Banana 2, is now available within the Gemini Pro plan.

This update brings a notable increase in detail, advanced world knowledge, precise text rendering and translation, and improved subject consistency, allowing for up to five characters and 14 objects within a single image. While 4K upscaling was present in previous versions, Nano Banana 2 refines these capabilities.

Beyond image generation, Google has also upgraded its educational tool, NotebookLM. The new NotebookLM cinematic overview feature generates informative videos complete with animations and motion graphics.

While the underlying technology for generating on-demand motion graphics remains somewhat mysterious, this feature is designed to assist users in creating video content. Currently, this advanced feature is accessible to users on Google’s premium Ultra plan, priced at $200-$250 per month, likely due to the substantial token usage required.

Perhaps the most impactful release from Google is Gemini 3.1 Pro. Positioned as Google’s flagship model, Gemini 3.1 Pro is a natively multimodal, high-reasoning system designed for professional and developer tasks. Unlike many AI models that specialize in a single data type, Gemini’s strength lies in its ability to process and understand various inputs, including video, audio, and images.

The new version boasts an enhanced multimodal understanding, with its MMU Pro score reportedly reaching 76.8. This model is built with the concept of “world models” in mind, featuring stronger reasoning, improved reliability, longer and more structured outputs, and an extensive context window of up to 1 million tokens.

This allows it to handle vast amounts of data, from lengthy documents and codebases to long videos and audio files. Function calling and search grounding with Google Search further enhance its utility, making it a powerful tool for those already within the Google ecosystem.

OpenAI Counters with GPT 5.4 Pro and Agentic Capabilities

In response to the advancements, OpenAI has launched GPT 5.4 Pro, which is currently positioned as the most advanced model available. While Gemini 3.1 Pro excels in multimodal tasks, GPT 5.4 Pro is engineered for cutting-edge reasoning, particularly in areas like frontier mathematics, computer use, and complex scientific problems.

This makes it the go-to model for scientists, researchers, and professionals engaged in high-stakes technical work. GPT 5.4 Pro is accessible through ChatGPT Plus and Enterprise, as well as via the API.

A significant improvement in GPT 5.4 Pro is its enhanced ability to handle standard conversations and reduce reasoning mistakes that plagued earlier versions, such as GPT 5.2. While users are encouraged to test its performance on their specific use cases, initial observations suggest a marked improvement in conversational fluidity and accuracy.

Microsoft has also entered the fray with “Copilot Tasks,” a system designed to automate to-do lists. Users describe tasks in natural language, and Copilot executes them by working across various applications and services, reporting back upon completion. Microsoft frames this as the second chapter of AI, moving from conversational agents to task-executing systems.

Examples include automatically surfacing urgent emails with draft replies, unsubscribing from promotional mail, tracking apartment listings, and creating study plans from syllabi. Copilot Tasks includes safeguards, requiring consent for significant actions like financial transactions or sending messages on behalf of the user. It is currently in a limited research preview, with a broader launch planned.

The term “Microsoft” is now a derogatory label within some online communities, reflecting frustration with Microsoft’s pervasive AI integration into its products, perceived by some as low-value “AI slop.” The company’s decision to ban the term on its official Discord server has only amplified its use.

Perplexity has unveiled a “general-purpose digital worker” that operates within user interfaces, capable of reasoning, delegating, searching, building, and remembering tasks over extended periods. This advanced system leverages multiple AI models, including Claude Opus 4.6 as its core reasoning engine, orchestrating specialized agents for tasks requiring Gemini for research, Nano Banana for images, and GPT 5.2 for long-term recall.

It functions as a model-agnostic orchestration layer, selecting the best AI for each sub-task. Perplexity’s offering is priced at $200 per month for its Max tier, providing a powerful, albeit premium, solution for power users seeking to automate complex workflows without deep technical expertise.

AI Drama and Ethical Considerations

The AI community has been abuzz with significant developments and controversies. Anthropic faced a major challenge when former President Trump called for a six-month phase-out of its technology from all federal agencies, citing concerns about the company dictating military operations.

This public statement and subsequent executive order created a rift between Anthropic and the U.S. government, raising questions about the ethical use of AI in sensitive applications like mass surveillance and autonomous weapons. Anthropic had previously stated its inability to ethically support such uses of its technology.

Adding to the disruption, an estimated 2.5 million users have reportedly shifted away from ChatGPT. This movement, dubbed “Quick GPT,” was fueled by several factors, including a $25 million donation by OpenAI’s Greg Brockman to a pro-Trump organization, the use of ChatGPT in an ICE resume screening tool, and OpenAI’s reported Pentagon deal – a contract that Anthropic had declined on ethical grounds. These events, coupled with perceived degradation in ChatGPT’s performance with the 5.2 model, have led users to explore alternatives like Claude.

Further internal turmoil at OpenAI involved the firing of an employee suspected of leaking information to prediction markets. Meanwhile, the disbanding of the Qwen team at a major AI lab has been attributed to corporate restructuring clashing with the technical vision of its lead, highlighting the challenges of integrating research and business objectives.

Robotics Advances and Humanoids in Factories

In robotics, Stanford has developed FSME, a memory system that enables robot AI to learn physical principles in real-time without retraining. This system allows robots to learn from experience, bridging the gap between abstract knowledge and practical application.

FSME uses a tiered memory system—episodic memory for raw experiences, hypothesis generation for understanding causes, and principle promotion for future actions. Its principled abstraction achieved a 76% success rate in tests, a significant improvement over raw experience retrieval.

Physical Intelligence, a well-funded robotics AI startup backed by prominent investors, is developing foundational models for robots. Their latest release, MEM (Multiskill Embodied Memory), combines short-term visual tracking with long-term natural language narratives, allowing robots to maintain focus for extended periods—up to 15 minutes—sufficient for tasks like cleaning a kitchen or preparing a meal. This memory system allows robots to adapt their strategies based on past successes and failures, demonstrating context adaptation and improved problem-solving.

Faraday Future has entered the humanoid robot market with FFAI Robotics Inc., launching embodied AI robots that bear a striking resemblance to existing Chinese models like AGI Bot A2 and X2. Their offerings include a full-size professional humanoid starting at $35,000 and an athletic action humanoid for $20,000, alongside a quadrupedal security companion robot.

The integration of humanoids into industrial settings is also progressing. BMW has deployed its first humanoid robots in its European plants, exploring their potential for factory tasks. While still in the early stages, these deployments represent a significant step towards realizing the future of AI-powered automation in manufacturing.

Source: AI News – New Models From Google & OpenAI , AI Drama & Humanoids In Factories (YouTube)

Leave a Reply Cancel reply

Written by

John Digweed

3,192 articles

Life-long learner.