Frontier Models

Google Gemini Updates

A summary of Gemini updates in January 2026 in this post here.
Chrome has a built‑in Gemini sidebar that can read across tabs, access Gmail/YouTube/Search history, and auto-browse sites to complete tasks like booking appointments or collecting tax documents.
Google puts Gemini directly into Google Maps so you can talk hands-free while walking or cycling, like asking it to text a friend you’re 10 minutes late or find a cafe with a bathroom on your route.
DeepMind launched Project Genie, an AI that generates explorable 3D worlds from text, simulating physics and interactions in real time.
Gemini’s new Personal Intelligence feature lets it securely look at your Gmail, Photos, Calendar, and other Google apps so it can do things like find your car’s license plate from a photo, suggest the right tires for your minivan, or plan a spring break trip based on your family’s past travels and interests.
Students can take full-length free SAT practice exams in Gemini. Type “I want to take a practice SAT test” as your prompt, and Gemini will create the test.
Google and Khan Academy are partnering to use Google’s Gemini AI models to launch new tools like Writing Coach and Reading Coach that help students improve literacy. Instead of just giving answers, Writing Coach walks students through planning, drafting, and improving their own writing.
Gmail is adding Gemini AI (AI inbox) so it can summarize your emails, answer questions about them, write and improve replies for you, and show a special inbox that highlights your most important messages.
40 most helpful AI tips from Google.
Google’s new Gemini 3 Flash is a faster, more capable, multimodal “workhorse” model that now defaults in the Gemini app and Search, offering near–frontier-level performance at low cost for consumers, enterprises, and developers.
Google Labs has launched Disco, an experimental AI-centric web browser. Its standout feature, GenTabs, uses Gemini 3 to turn your queries into interactive mini web apps that help you answer questions or get tasks done directly in the browser.
Google Labs updates summary in 2025.
Google is updating its experimental Stitch design tool with Gemini 3 to generate higher-quality UIs and adding a new “Prototypes” feature that lets you connect screens into interactive, working app flows.
Everything Google launched in 2025 for education: 2025 Launch Guide
Google has introduced Gemini 3, its most advanced AI model, which excels in multimodal understanding, state-of-the-art reasoning, and agentic capabilities.
Google Maps adds “Know Before You Go” feature which gathers details about your destination before you go. It also redesigned the Explore tab, added nickname-based reviews, and now shows real-time EV charging availability.
Google launced Google Scholar Labs, a new AI-powered feature in Google Scholar that analyzes complex research questions, searches related topics and relationships, and explains relevant scholarly papers.
Gemini 3 Pro Image (Nano Banana Pro) offers precise studio-quality control, clear text generation, multilingual localization, and advanced creative features. Hint: Using the default “fast” model on Gemini will only kick off “Nano Banana.” To use “Nano Banana Pro,” pick the “Pro” model on Gemini.
Upgrades to Google Photos: 1) Nano Banana AI creative image restyles. 2) Edit photos by typing or speaking requests. 3) Quick AI-generated templates for cards and headshots. 4) “Ask Photos” search expands globally. 5) New in-image “Ask” button for info or edits.
Google launched new AI shopping features: 1) AI helps you search for products and compare them. 2) 1.Personalized and conversational shopping is made easier. 3) You can ask Google to check stores for stock. 4) Google can automatically buy items you want.
Claims: A big challenge in image generators is said to be overcome by the updated Nano Banana 2 by Google in late 2025. Image generation tools have difficulty generating analog clock showing a specific time or wine glass filled to the brim. With the clock, the correct hand placement and readability of the clock time has been a challenge; and generating an image of a wine glass filled to the brim is difficult because most photographic data used for training only showed glasses filled about halfway. The clock and wine glass AI test is an informal challenge that evaluates AI’s ability to generate images of an analog clock showing an accurate time and a wine glass filled completely to the brim, tasks that earlier AI models struggled to do correctly due to training data limitations and conceptual understanding.

A close-up of a wine glass filled to the brim with red wine, set on a wooden table in a dimly lit restaurant, with blurred background elements indicating an indoor dining ambiance. — Image generated by Gemini 2.5. Pro Nano Banana 2025.

An analog wall clock displaying the time 10:17, set against a gray wall with a blurred bookshelf in the background. — Image generated by Gemini 2.5. Pro Nano Banana 2025.

Google has rolled out its Opal AI app builder to 160+ countries, letting teams create internal tools without coding or engineering delays.
Google Maps is rolling out Gemini-powered features including hands-free conversational navigation, landmark-based directions, proactive traffic alerts, and AI-powered visual place discovery to make driving and exploring easier and smarter on Android and iOS in the US.
Gemini’s Deep Research can pull in info from @Gmail, @GoogleDrive, and Chat.
Google’s Gemini Canvas can generate complete presentations from a single prompt or uploaded notes, integrating with Workspace tools.
Google Cloud launched Gemini Enterprise, a unified AI platform enabling businesses to deploy advanced, secure chat agents that automate workflows and integrate with enterprise apps.
See virtual try ons: E.g., shoes.
Google launched “Computer Use” model: The Gemini 2.5 Computer Use model lets AI systems use computers much like a human would. For example, it can click, type, and fill out forms on websites and apps. Developers can use it to automate tasks in a browser (like organizing digital sticky notes or booking appointments online)
Google launched “Learn Your Way” which is a research experiment that uses generative AI to turn traditional textbook materials into interactive, adaptive learning experiences tailored to a student’s grade level and interests, offering tools like mind maps, audio lessons, and quizzes for personalized education. Learning is very personalized and tailored now!
Google is upgrading its Google Photos app with Veo 3, letting users turn their still pictures into four-second video clips. Instructions: From Goole Photos app, choose “photo to video” from the dropdown menu.
Gemini 2.5 Flash Image Maker (select Gemini Native Image, Formerly known as Nano Banana): Google’s new image maker fixes prompt precision and subject consistency, keeping faces consistent across edits like pose shifts or lighting changes, something GPT-4o and Firefly still struggle with. Try building professional headshots! Google also released image generation prompting tips. See some use cases here. Update: Gemini has overtaken ChatGPT as the No. 1 iPhone app as of September 2025 due to Nano Banana craze (e.g., turning selfies into hyperrealistic 3D figurines).
Google Translate challenges DuoLingo: Powered by Gemini, Google Translate now offers real-time conversation support in 70+ languages and adaptive language lessons tailored for travel, study, or business—rolling out on iOS and Android.
Google Docs adds AI-powered read-aloud feature: Google is updating Docs to let users generate audio versions of their documents using AI, making it easy to listen to content read aloud.
Google Photos adds voice and text editing: You can now say or type edits like “brighten face” or “remove object” in Google Photos.
Guided Learning in Gemini: Gemini challenges ChatGPT with new learning tool. Google has launched Guided Learning in Gemini, a feature like ChatGPT’s Study Mode that breaks down problems and quizzes users step-by-step.
Coding agent Jules: Jules, Google’s AI coding assistant, is now out of beta. Powered by Gemini 2.5 Pro, it works autonomously in the cloud so you can assign tasks and let it handle bug fixes without supervision. A free version is available, with higher usage unlocked via subscription.
Ask ChatGPT
Genie 3: Genie 3 is a new world model that can generate interactive, consistent environments in real-time, enabling embodied AI agents to navigate and explore fantastical and realistic landscapes. The model pushes the boundaries of world simulation, with breakthroughs in environmental consistency, real-time interaction, and promptable world events.
Storybook: You can create storybooks with a prompt like this. Storybook turns simple prompts into 10-page, voice-narrated children’s stories. Each page is illustrated in a chosen style, like claymation, comics, or anime.
Google just launched Gemini 2.5 Deep Think, its most advanced AI yet. It uses multiple AI agents working together to solve complex questions. This new kind of multi-agent AI is powerful but expensive to run. Other labs like OpenAI and xAI are also working on similar tech to build smarter (but slower and costlier) AI.
Google added an image-to-video tool to Veo 3 in Gemini, letting Pro and Ultra users turn photos into clips with text—advancing its multimodal platform. Here’s how to try it:
- Open the Gemini app or gemini.google
- Tap Video in the prompt bar (Pro/Ultra only)
- Upload a photo
- Enter your video description
- Turn on sound and enjoy
Google added three Gemini-powered AI modes to Firebase Studio:
- Ask – Discuss and plan with Gemini
- Agent (needs approval) – Gemini suggests changes to your app
- Agent Auto-run – Gemini auto-applies changes to your app

OpenAI Updates

OpenAI has released GPT‑5.3 Codex, its fastest and most capable coding model.
OpenAI launched Frontier: An enterprise platform that lets companies create AI “coworkers” that understand their business, have proper permissions, and can do real work like optimizing production, troubleshooting failures, or running sales workflows across existing systems. More info here.
OpenAI launched Prism, a free GPT-5.2-based AI workspace that helps researchers write, edit, and analyze scientific papers using tools like LaTeX and automated citation management. Check it here.
OpenAI will start showing clearly labeled ads at the bottom of answers for users on the free and low-cost ChatGPT Go plans (for example, travel or shopping promos after you ask about a trip), while keeping higher-tier paid plans ad‑free. Issues: This could mislead users, exploit emotional trust, and misuse sensitive personal data for targeted advertising. Update: OpenAI started testing ads here.
OpenAI launched ChatGPT Go, an $8/month plan with expanded features and GPT-5.2 access, while confirming it will soon test ads in the Free and Go tiers for U.S. users.
OpenAI launches ChatGPT Health, a dedicated ChatGPT experience that connects to your medical records and wellness apps so one can better understand tests, prepare for appointments, and manage health and insurance decisions without providing formal diagnoses. Right after this, Anthropic and Amazon also launched AI-health integrations.
OpenAI now lets ChatGPT users personalize its tone and style—like making it warmer, more enthusiastic, or emoji-filled—by adjusting settings under Settings → Personalization.
ChatGPT now has an in-chat app store, turning it into an action-oriented platform where users can access services like Expedia, Spotify, and Zillow directly within conversations to book travel, make playlists, or find homes. Developers can now build mini-apps with the OpenAI Apps SDK, submit them for review through the developer platform, and, once approved, have them listed in a new in‑ChatGPT app directory that users can discover via the tools menu or a direct apps URL.
OpenAI released GPT-5.2-Codex, a GPT-5.2 variant built for long, agent-driven coding and cybersecurity work. It stays reliable over very long coding sessions without losing context.
GPT-Image-1.5: OpenAI introduced a new ChatGPT Images capability, using the GPT‑Image‑1.5 model to provide more accurate, faster image generation and editing, including better text rendering and preservation of visual details.
OpenAI introduced GPT-5.2, a large language model designed to improve reasoning and coding performance for developers and professional users. It is positioned as a direct competitor to Google’s Gemini 3 and was developed amid growing concerns about compute costs and the absence of an updated image-generation system. GPT 5.2 Pro generated a proof for Erdős Problem #397 (a long‑standing, unsolved number theory question asking whether there are infinitely many integer solutions to a specific equation involving central binomial coefficients).
OpenAI launched a new “shopping research” feature in ChatGPT is an interactive tool that conducts deep, personalized product research across the web—asking clarifying questions, comparing options, and then generating a tailored buyer’s guide with up-to-date details to help you choose what to buy.
OpenAI tests group chats in ChatGPT: OpenAI is piloting a feature that lets users chat with friends and ChatGPT together in the same thread, enabling collaborative trip planning, project work, and casual group conversations.
OpenAI has released GPT-5.1, a more conversational and adaptable version of ChatGPT featuring enhanced reasoning abilities and easier customization of tone and style for users, with rollout starting for paid subscribers and gradual expansion to all users.
Users can now interrupt long-running queries and add new context without restarting. The model updates its response based on the new input when users click Update in the sidebar.
“Company Knowledge:” OpenAI’s “company knowledge” lets ChatGPT connect to workplace apps like Slack, Google Drive, and GitHub so it can answer questions with accurate, business-specific context, such as automatically generating a client call briefing using recent messages, emails, and meeting notes. For example, it can summarize customer feedback by pulling insights from Slack channels, Google Slides, and support tickets, recommending next steps based on your company’s real data.
OpenAI launches ChatGPT Atlas, a conversational browser: ChatGPT Atlas can talk with you across tabs, remember context, and handle online tasks like booking flights or editing documents, making web browsing fully interactive.
A New App Store: At its DevDay event (an annual developer conference hosted by OpenAI), OpenAI unveiled ChatGPT apps, monetization tools, and access to its 800M users—potentially creating the AI era’s version of the App Store. OpenAI’s 2025 DevDay revealed its ambition to turn ChatGPT into a “platform play,” integrating third-party apps so users can access services like Spotify and Zillow directly in ChatGPT, signaling their effort to become the web’s new homepage.
Open AI’s “Agent Builder:” OpenAI is about to launch Agent Builder, a new tool that helps people create AI agents (automated programs) easily. You use a drag-and-drop interface to build workflows—like customer bots or tools for finding and comparing information—with no coding needed. It lets you connect different features, set up logic (like if/else decisions), and control how the agent works, similar to other tools like Zapier. The goal is to help anyone—especially developers and businesses—build and test AI agents quickly and with less technical skill, making it much easier to automate tasks using AI. Here are 50 sample use cases. You can access the agent builder here: https://platform.openai.com/agent-builder. Here is a guidance.
Sora 2 video and Sora app (social media): OpenAI launched Sora 2, a model for generating hyper-realistic video, and the Sora app, a TikTok-style feed to showcase them. The app includes a “cameo” feature to insert yourself or friends into AI-generated scenes, with revocable access controls. See the difference of Sora vs Sora 2 videos here. Issue: OpenAI’s Sora Makes Disinformation Extremely Easy and Extremely Real: Examples of misinformation include ballot fraud, immigration arrests, protests, criminal acts (robbery, home intrusions), and bomb explosions and fake images of war. Takeaway: Increasing blurry lines between real and fake.
OpenAI adds Pulse to ChatGPT: Pulse acts like a proactive personal assistant. Instead of waiting for you to ask it questions, Pulse works overnight and analyzes your interests, your recent chats, and any connected data you have with ChatGPT. Each morning, it gives you a personalized summary or set of recommendations with 5–10 AI-generated briefs from news, Gmail, Calendar, and past chats that it thinks you’ll find useful. Pro users see them as “cards,” with expansion to Plus and free tiers planned.
OpenAI has launched Instant Checkout in ChatGPT, allowing U.S. users to directly buy products from Etsy (and soon Shopify merchants) without leaving the chat, powered by the new Agentic Commerce Protocol co-developed with Stripe. Update: PayPal has also partnered with OpenAI to enable ChatGPT users to make instant purchases using PayPal.
Project sharing for team work and better integration: Shared projects-Teams work together, share files, and ChatGPT uses everyone’s context. Smarter connectors-Automatic integration with Gmail, Outlook, Teams, GitHub, Dropbox, and more.
OpenAI has announced it is developing an AI-powered hiring platform called the OpenAI Jobs Platform, set to launch by mid-2026, that will directly compete with LinkedIn by using AI to match candidates with businesses and offer certifications in AI fluency through its OpenAI Academy. Related: LinkedIn is launching an AI-powered Hiring Assistant to help recruiters spot overlooked talent and focus on key tasks.
OpenAI has added a much-requested branching feature to ChatGPT, letting users spin off new threads from any conversation to test different ideas, drafts, or code. You can find it under the ‘More actions’ menu at the bottom of any chat, then select “branch in new chat.”
ChatGPT has the ability to monitor chats, flag suspicious activity, and report certain conversations to law enforcement. Outcome shows finding the right balance is challenging: e.g., flagging non-critical chats.
OpenAI launched GPT-5, claiming it is faster, more accurate, and less prone to “hallucination” (80% reduction). GPT-5 automatically selects appropriate models and proactively suggests actions to accomplish tasks. The new model replaces all legacy models including GPT-4o, o3, and o4 variants because it it’s the first “unified” AI model that combines deep reasoning and faster responses and automatically picks the right model to use for the given task. It comes with a new adaptive setup with a fast base model, a deeper “GPT-5 Thinking” engine for complex tasks, and a smart router that requires no manual selection. It can generate complex content like business plans and coding applications with minimal prompting, allows even novices to build simple software apps from short text prompts, marking a significant step towards artificial general intelligence (AGI is the level where AI possesses the cognitive abilities of a human). Related: They also released ChatGPT Prompting Guide.
Issues with GPT-5 launch: GPT-5 aimed to advance complex reasoning with a new “Thinking” mode—but routing issues caused it to default to faster, less capable models. As a result, it underperformed GPT-4o in algebra, coding, and logic tasks, and lagged behind rivals like Claude 4.1 in real-world tests. Security flaws, including prompt injection vulnerabilities, have added to concerns over reliability. The company said they will bring back GPT-4o. CPT-5 is also not as friendly, so the updates will make it “warmer and friendlier.“
ChatGPT now has agentic capabilities, allowing it to interact with websites, gather information, and complete complex tasks on your behalf using its own virtual computer. This unified system combines Operator’s web interaction, deep research’s synthesis skills, and ChatGPT’s intelligence, enabling it to handle workflows from start to finish. Select “Agent Mode” from the dorpdown menu when you click on the “+” sign on the chatbox.
- Here are some interesting use cases in education:
  - A video of a student automating the homework submission using AI agent.
  - A video of an instructor automating grading of student submissions using AI agent.
ChatGPT releases “Study Mode:” Instead of just getting quick answers, study mode in ChatGPT offers step-by-step guidance to support deeper learning, with interactive prompts, scaffolded responses, personalized support, and knowledge checks to help students build understanding. Select “Study and learn” from the dropdown menu when you click on the “+” sign on the chatbox. Sample prompt: “Act like a math tutor. Give me a calculus problem, ask me for each intermediate step (like finding the derivative, simplifying, etc.), and only reveal the next step when I get it right.“

User interface showing a search query input with options such as Add photos & files, Add from apps, Agent mode, Deep research, Create image, Study and learn, Web search, and Canvas.

Anthropic Updates

Anthropic launches Claude Opus 4.6, its most advanced model focused on enterprise knowledge work, with stronger planning, a 1M context window, and multi-agent “agent teams” for complex tasks.
Claude is adding clickable mini‑apps inside so paid users can do things like send Slack messages, design in Canva or Figma, and pull Box files without leaving the chat.
Claude launched “Claude in Excel,” an add-in that lets you chat with Claude directly inside Excel so it can understand your whole workbook, explain formulas, and safely build or edit models for you. Also, there is Claude in PowerPoints here.
Anthropic unveils a detailed new constitution for Claude that sets out practical rules—like refusing to give self-harm instructions while offering coping resources, declining to help with malware but explaining cybersecurity basics, and answering political questions by clarifying facts without persuasion—to keep the model safe, ethical, transparent, and reliably helpful for users and developers. Long version here.
Anthropic released new healthcare AI updates, letting Claude connect to apps like Apple Health and Health Connect to summarize medical data and offer personalized advice.
Anthropic’s Claude Code accelerates software development from months to days, as shown by building Cowork in just 10 days. This challenges SaaS subscriptions, with stocks like Intuit and Salesforce dropping double digits last week, while data-rich firms gain. Find 200+ Claude Code Sub-Agent Prompts here. Ideas on how to automate your life with claude code (for non-technical people). Here is a fun WSJ article about Claude Code. Fun example here. 50-good Claude Code prompts. Create presentations like this one. Top Claude Code strategies from its creator. Resources to master Claude Code here.
Anthropic launched Cowork, a user-friendly version of its popular Claude Code for handling everyday tasks across documents, files, and the web. It is an agent that reads and edits any document on your computer. See an example of it organizing a messy desktop here. And here is a tutorial. Sample use of organizing receipts here. Update: Anthropic has added plug-in support to Claude Cowork, letting non-coders automate workflows in tools like sales CRMs, legal document systems, and marketing dashboards, with eleven open-sourced plug-ins on GitHub and easy options for creating custom ones.
Claude has an improved Claude Code. Here is a good guide to get started with Claude Code, and here is a free course on it.
Project Vend is an experiment in which Anthropic let an upgraded AI run a real vending-machine business, showing that it could handle basic shop tasks better than before, but it still made serious, sometimes silly mistakes when pushed or tricked by people. Watch the experiment here.
Claude Opus 4.5 is Anthropic’s latest flagship AI model that improves coding, tool use, and long-context reasoning efficiency over earlier Opus versions, and it is now integrated across Claude’s apps and developer offerings.
Claude released Agent Skills which let you easily add specialized instructions and code to Claude AI agents, making them smarter and better at real-world tasks by organizing skills into simple folders and files.
Claude can now create and edit files—including Excel spreadsheets, documents, PowerPoint slides, and PDFs—directly within Claude.ai and its desktop app. Users can describe the file they need or upload data, and Claude will generate ready-to-use, professional files (with formulas, dashboards, or formatted text). This marks a shift from basic text replies to Claude completing full projects. Claude also got memory powers, can remember previous conversations and carry context across projects automatically. (A side note: GenSpark.ai has been creating similar documents for some time now-on the free version).
Claude launched Sonnet 4.5, Anthropic’s most advanced and capable model yet, with major improvements in coding, reasoning, and safety at no added cost.
Anthropic has released Claude for Chrome, an experimental AI tool that embeds Claude directly into the browser. It can read, summarize, and act on web pages in real time—no extra logins needed. Note: Anthropic is not the only one in the game: Perplexity’s Comet already handles browser tasks, Google’s testing Gemini in Chrome, and OpenAI is reportedly building its own AI-powered browser. Our browser experience is getting more agentic.
Anthropic adds chat-ending feature to Claude: Claude Opus 4 and 4.1 can now end chats after repeated abusive prompts—not to protect users, but to safeguard the model. It’s part of Anthropic’s new “model welfare” effort, exploring the idea that AI might one day warrant moral consideration.
Anthropic’s new Claude Opus 4.1 is now the top-performing AI on a major coding test, beating models from OpenAI and Google. It’s especially good at fixing bugs and working across large code files.
Anthropic introduced “persona vectors” that let them detect and control traits like sycophancy, harmful behavior, or hallucinations in AI. This improves safety without hurting performance and reveals toxic data or subtle personality shifts that other tools miss. Similar to brain scans revealing areas linked to emotions, Anthropic mapped parts of an AI’s neural network that activate during behaviors like sycophancy, hallucination, or maliciousness—a kind of AI psychiatry. Researchers found two fixes: 1) Early detection predicts bad behavior by tracking neural activity during training. 2) Vaccine method trains the AI with harmful traits, then removes them to prevent future issues.
Artifacts got an upgrade. You can generate apps, websites and many more fun interfaces. See samples here.
Anthropic ended its experimental AI-generated blog: Anthropic’s AI-generated blog “Claude Explains” was shut down after a month-long pilot. The blog, which aimed to showcase Claude’s writing abilities, lacked transparency about the extent of AI-generated content. Anthropic cited the need to combine customer requests with marketing goals as the reason for the blog’s early demise.
Claude Opus 4 and Claude Sonnet 4, set new standards for coding, advanced reasoning, and AI agents. Claude Opus 4 is said to be the world’s best coding model, while Claude Sonnet 4 delivers superior coding and reasoning. Both models can use tools during extended thinking and demonstrate improved memory capabilities.
Claude Pro subscribers now have access to two powerful features: Integrations, which connect the chatbot to your favorite apps, and Research, which creates reports using information from the web, your Google Workspace, and other connected sources.

Perplexity Updates (Comet, Labs, Pages, Spaces)

Perplexity launched Model Council, a multi-model research feature that aggregates multiple models into a single answer. Instead of verifying your queries across multiple models manually, Model Council allows you to run the same query across several models at once. It queries three leading AI models, compares their answers, and delivers one unified response. See it here.
Peplexity users can now create and edit slides, sheets, and docs directly within their prompt sessions on Perplexity.
PayPal is giving U.S. PayPal and Venmo users early access to Comet, Perplexity’s new AI browser, along with a free year of Perplexity Pro ($200 value).
Comet AI Browser: Perplexity has launched Comet, an AI-powered web browser that aims to compete with Chrome and Safari, leveraging its proprietary search engine to provide AI-generated summaries, links, and suggestions. Comet includes a sidebar “Comet Assistant” that reads tabs, summarizes emails, scans calendar events, and handles tasks like booking or shopping—offering seamless, cross-page support that replaces the need for multiple apps. It’s rolled out now to $200/month Max subscribers.
Perplexity Labs: Perplexity Labs is an AI tool for Pro subscribers that helps you quickly create reports, spreadsheets, dashboards, and mini-apps by automating research, analysis, and data visualization. It goes beyond basic search by generating complete projects in minutes, leveraging tools like web browsing, code execution, and chart/image creation.
Perplexity Pages: Perplexity Pages is a new feature that helps users easily create and share visually appealing, comprehensive articles or reports based on their research. It leverages AI to generate structured content from searches or existing Perplexity threads, and allows customization, visual enhancement, and direct sharing to a user-generated content library or other platforms.
Perplexity Spaces: Perplexity Spaces are AI-powered workspaces for organizing research and collaborating on projects. You can use them to: (1) Organize your research: Group “Threads” (search queries) and files related to specific topics. (2) Integrate personal files: Upload and use your own documents alongside web search results. (3) Collaborate: Invite others to view or contribute to your research. (4) Customize AI: Set custom instructions for the AI within each Space.

Microsoft Copilot Updates (Excel integration)

Microsoft debuts MAI-Image-1, its first in-house AI image model
The new model produces highly photorealistic images with advanced lighting and landscape rendering and already ranks in the top 10 on LMArena.
Microsoft has added a new Agent Mode to Excel and Word in Microsoft 365, allowing users to automate the creation and review of spreadsheets and documents using AI. Through Copilot chat, users can generate presentations and detailed reports with support from Anthropic’s AI models, in addition to OpenAI’s technology. These features are currently available in the web versions of Microsoft 365 and are intended to streamline common productivity tasks by integrating advanced AI directly into familiar office applications. The Excel Agent Mode achieved 57.2% accuracy on SpreadsheetBench, ranking higher than Shortcut.ai, ChatGPT’s .xlsx agent, and Claude Files Opus 4.1, but lower than the human accuracy level of 71.3%.
Microsoft has launched its first in-house AI models, including a fast speech generator and a new large language model for Copilot features: MAI-Voice-1and MAI-1-preview.
Microsoft added a new COPILOT function to Excel, allowing users to summarize, categorize, and pull external data using simple formulas. Currently in beta, the feature requires a $30/month Microsoft 365 Copilot subscription. Tutorial here.

NotebookLM Updates (Video Overviews, Expert curated notebooks)

NotebookLM launched flashcards, quizzes, and many types of reports.
NotebookLM is rolling out a major audio upgrade with three new modes: Brief, a quick two-minute summary for fast takeaways; Critique, where dual AI hosts analyze content like in-house editors; and Debate, which presents opposing perspectives to pressure-test ideas.
Google adds Video Overviews to NotebookLM: NotebookLM now creates visual explainers from PDFs, notes, and images, building on its audio summary feature. The update pushes Google’s AI deeper into personalized, multimodal learning — turning dense content into tailored, easy-to-digest formats.
NotebookLM now features expert-curated notebooks on a wide range of topics, from science to literature. Users can explore content, ask questions, and share their own notebooks with the community. Some examples: Parenting advice for the digital age by psychology professor Jacqueline Nesi, longevity research from cardiologist Eric Topol, an economist report on the biggest market, tech, and political trends this year, the works of Shakespeare

Meta/Manus Updates

Meta is buying Singapore-based AI startup Manus for $2B, integrating its agent tech into Facebook, Instagram, and WhatsApp. Manus will remain independent and cut ties with China.

Manus launched The Manus Browser Operator, a browser extension that turns a local browser into an AI-powered agent, enabling automation of tasks within authenticated platforms (like CRMs, premium research tools, and subscription sites) directly using logged-in sessions.

Manus image generation gets much better. It understands your intent, plans a solution, and knows how to effectively use image generation along with other tools to accomplish your task.

Gemini for Students and Educators

Google’s new AI tools gives schools access to premium AI models with better data protection. Educators can share custom AI experts and students can generate personalized quizzes.
Educators are creating custom AI experts and interactive simulations using the Gems feature in the Gemini app and will soon be able to share them.
NotebookLM will soon also enhance the audio overviews feature with video overviews which turn your sources into engaging educational videos.

Genspark Updates (AI Browser, AI Secretary)

AI Browser: Genspark AI Browser embeds AI tools directly into every webpage you visit, transforming your browser into an intelligent workspace. It provides seamless access to AI-powered search, analysis, and productivity features while maintaining robust privacy and security protections.
AI Secretary: Genspark AI Secretary integrates with Gmail, Google Calendar, Google Drive, and Notion to automate your workflow management. This intelligent assistant handles meeting scheduling, communications, and administrative tasks to streamline your daily productivity.

ChatGPT Updates: O3-Pro, ‘Search’ as a shopping tool and Its New Image Library

OpenAI launched o3-pro, its most capable reasoning model so far. It outperforms o3 in terms clarity, instruction-following, and domain-specific tasks like business writing and programming. It is tuned for precision over speed.

ChatGPT’s Search has been upgraded to deliver unbiased product recommendations with images, prices, and links—perfect for prompts like “best espresso machine under $200.” It now includes multiple citations, trending searches, and autocomplete to make searching faster and more reliable. These features make ChatGPT a powerful, user-friendly alternative to traditional search engines.

Also, ChatGPT just made it easy to find and edit all the AI images you’ve ever generated.

Also, OpenAI is working on a screenless, wearable AI device with a camera and mic, designed with former APple designer Jony Ive. Aiming to ship 100M units by 2027, it could become a new core tech device—but faces the challenge of shifting users away from smartphones.

Also: Voice Upgrade: OpenAI just made ChatGPT’s voice sound more natural and added live translation. Paid users also get smarter responses, better memory, and new tools to pull in info from apps like Dropbox and GitHub. Free users will see more personalized replies based on recent chats. It can also translate between languages in the middle of the conversation.

OpenAI’s new “Record Mode” for ChatGPT Pro on macOS lets users record, transcribe, and summarize meetings.

Google gets a major upgrade, May 2025.

Google gets a major overhaul:

Gemini & Tools Overhaul

It is gradually launching “Project Mariner” which is an experimental browser agent in Chrome for Gemini Ultra subscribers. It navigates, fills forms, and performs tasks via chat prompts, with strict user permission required for all actions to ensure privacy.
Gemini 2.5 Pro: Now includes enhanced reasoning mode, excelling in math and coding benchmarks.
Gemini 2.5 Flash: Now available to all users.
Jules (Coding Agent): New autonomous programming agent, similar to OpenAI’s Codex.
Stitch: New platform for designing user interfaces.
$250/Month Subscription Tier: Offers early access to cutting-edge features for “trailblazers and pioneers.”
Google AI Mode (i.e., Google search shows chatbot results on your screen) is rolling out to everyone across the US. Relatedly, Google is testing ads in AI Mode and expanding AI Overviews ads to desktop. Ads will appear within or below AI-generated answers for longer queries. This monetizes AI search while creating new advertising opportunities, likely shifting ad strategies away from traditional keyword-based approaches.

Media Models Upgraded

Veo 3: Generates audio with video, improved lip sync, and better real-world physics.
Imagen 4: Offers sharper text generation, more aspect ratio options, and 2K resolution.
Lyria 2: Updated music generation tools for musicians.
Flow: New platform to turn short clips into full films with detailed control and scene consistency.

XR, Translation, and 3D Communication

Next-Gen XR Glasses: Google partners with Warby Parker, Samsung, and others for AI-powered smart glasses on Android XR.
Beam: Makes video calls more immersive by turning people into real-time 3D avatars.
Live Translations in Google Meet: Near-instant, sci-fi-like translations now built into live meetings.

Google Labs – Portraits

Google Labs’ new Portraits experiment offers personalized AI coaching experiences by creating interactive AI avatars of trusted experts, starting with Kim Scott, leveraging her communication and leadership insights.

Meta launched its first standalone AI app:

How Meta differentiates: Powered by Llama 4, Meta AI focuses on everyday consumers rather than enterprise users. Unlike competitors prioritizing productivity tools, Meta is embedding AI into daily life—across mobile, web, and even smart glasses—offering personalized, social, and entertainment-driven experiences by tapping into users’ Facebook and Instagram data.

Dark side: Meta’s AI push raises privacy concerns as its smart glasses default to recording user voices, feeding data into its systems unless users take extra steps to opt out.

Claude rolled out new research tool, voice features, and Google integration.

Anthropic has expanded Claude’s capabilities with new features, including a Research tool that runs multi-source searches for detailed answers. A Google Workspace integration lets Claude access Gmail, Calendar, and Docs, with more content sources coming soon.
Claude added new Integrations feature: Powered by Anthropic’s Model Context Protocol (MCP), this new feature makes it much easier to connect Claude to third-party apps like Zapier, Square, and Cloudflare. Previously, setting up such integrations required significant technical expertise, but now, this new approach handles the complex setup over the web, simplifying the process for users.

OpenAI introduced GPT-4.1 in the API, retiring GPT-4.5, launched “Library” for images, launching social media app, and enterprise pricing.

ChatGPT’s Deep Research’s new PDF export feature enables Plus, Team, and Pro users to download formatted reports, including tables, images, and clickable citations.
GPT-4.1 is a significant improvement over GPT-4o, with major gains in coding, instruction following, and long context understanding. They also provide a prompting guide here.

OpenAI is also retiring GPT-4.5—its most compute-intensive model—by July, citing high costs and shifting focus to scalable models.

A new “Library” tab on the left menu lets users view and manage all of their AI-generated images in one place.
OpenAI is testing a social platform to launch a new social feed. It’s not clear if it will live in ChatGPT or be a standalone app.
OpenAI is working on o3 and o4-mini models that can achieve scientific breakthroughs. The company is considering enterprise pricing that may reach $20,000/month.
Industry competitiveness leads to loosening safety guardrails.
OpenAI launched its own agentic coding platform, Codex.

Google rolled out several updates on Gemini and NotebookLM.

Below are some major updates from Google’s AI tools:

Google has launched Gemini 2.5 Flash, a lighter AI model that lets developers adjust how much reasoning the model performs via a token-based “thinking budget.”
Gemini Advanced users can now create high-resolution, 8-second videos with Veo 2 using text prompts. Google One AI Premium subscribers also get Whisk Animate to turn images into animations.
Deep Research (Gemini Advanced + Gemini 2.5 Pro): Experimental Gemini 2.5 Pro model functions as a personal research assistant with strong reasoning and structured synthesis. New Feature: Audio Overviews – podcast-style narrations for portable insights. It is positioned as a full-stack tool, not just a chatbot.
AI-Powered 3D Remastering, reviewed 3D remastering of The Wizard of Oz for Sphere’s 16K immersive screen, demonstrates AI in next-gen cinematic experiences.
AI Mode Upgrade: Now includes visual recognition. Users can upload images to receive contextual insights and suggestions.
NotebookLM Upgrade: Integrates external sources from live web content for enhanced generative capabilities. See demo of “Discover Sources” here.
Geospatial reasoning: Uses satellite imagery to enhance crisis response and urban planning. See the video here.

Microsoft Copilot rolled out “Actions” feature that can use web on your behalf and Pages feature to edit the AI responses.

Pages feature: Like ChatGPT’s Canvas, Pages lets you tweak AI-generated text on the spot: highlight a passage, hit “Ask,” and tell it to adjust the tone, lengthen a section, or rephrase a thought. For now, though, it doesn’t support coding tasks.
Microsoft Copilot can now perform online tasks like booking car rentals, concert tickets, restaurant tables or filling out forms on your behalf using simple chat prompts. This new “Actions” feature aims to automate common web-based tasks, allowing you to focus on other work while Copilot handles the details. The new Copilot can also remember details about your life, including food preferences, hobbies, and friends’ birthdays. Other features include AI-generated podcasts, vision capabilities, deep research, and customization.

Microsoft Copilot also extended “Copilot Vision” from web to mobile. This feature turns a phone’s camera into a real-time visual search tool. You can point your camera, and chat with Copilot to discuss what you see (e.g., plant health, product specifications, book recommendations) all processed in real time.

A year ago, Microsoft launched Phi-3 on Azure, bringing small language models (SLMs) to customers and expanding access to efficient AI tools. Microsoft’s new Phi-4 SLMs, especially the 14B Phi-4-reasoning-plus, rival much larger models like DeepSeek R1 (671B).

Meta launches two new Llama-4 models: Scout and Maverick

Meta has rolled out two new models from its Llama 4 family—Scout and Maverick—which are now integrated into Meta AI across WhatsApp, Messenger, and Instagram. Another model, Llama 4 Behemoth, is still in training, and is positioned as a future top-tier base model. Another model, Llama 4 Reasoning, is also in training. It employ a mixture-of-experts (MoE) architecture to boost efficiency and multimodal performance. Below are more details on the new models:

Scout: A lightweight model designed for efficiency, capable of running on a single Nvidia chip. It includes a 10-million-token context window and performs well compared to open-source models like Google’s Gemma 3.

Maverick: Suited for multimodal tasks that combine text and images, such as customer service bots that process photo uploads. It supports 12 languages.

Behemoth: The largest and most powerful of the models, serving as the foundation for Scout and Maverick. It shows strong performance on STEM benchmarks, exceeding GPT-4.5 and Claude 3.7, though it hasn’t been released publicly.

OpenAI is offering free ChatGPT Plus subscriptions to college students in the US and Canada through May.

Typically $20 per month, the subscription includes GPT-4o—OpenAI’s most advanced AI model—featuring improved accuracy, Advanced Voice Mode, unlimited DALL-E image generation, Deep Research for literature reviews, and priority access during peak times. Students can activate the offer by verifying their enrollment via SheerID on ChatGPT’s student page.

Commentary: The focus of AI tools on students is increasing. There are multiple reasons that make sense for businesses such as: 1) students are young and they are early in their purchase history, meaning they have a potentially high Customer Lifetime Value (CLV). 2) Students are heavy users of AI for creative projects, research, and complex problem-solving. 3) Universities are switching to seeing AI as essential in education rather than ignoring or resisting the use of it.

Anthropic launched Claude for Education, tailored specifically for college students and instructors

Claude for Education is inspired by OpenAI’s ChatGPT Edu plan but with a key differentiator: “Learning Mode.” This is a new feature that leverages Socratic questioning to enhance critical thinking skills. Instead of directly providing answers, Claude responds to student inquiries using Socratic questioning, such as, “How might you tackle this issue?” or “What evidence do you have to support your conclusion?” It poses thought-provoking questions, simplifies key concepts, and provides templates for essays, research papers, and study guides rather than providing a quick answer.

Video: Runway AI’s Gen-4 model solves the biggest problem of AI video generators

The biggest issue with video generators is character and scene consistency across multiple shots. Runway’s new model solves that issue but raises concerns around copyright and job displacement in the entertainment industry.

Commentary: This is similar to the what a new published paper achieves. Here’s a short abstract of the paper: Transformers struggle with generating coherent one-minute videos due to inefficient long-range self-attention. To address this, researchers introduce Test-Time Training (TTT) layers with expressive hidden states, enabling a pre-trained Transformer to generate videos from text storyboards. Their TTT-MLP model outperforms baselines in temporal consistency, motion smoothness, and aesthetics, though artifacts remain.

A new general-purpose agent: Super Agent by GenSpark

Genspark launched a general-purpose Super Agent that beats Manus and OpenAI’s Deep Research on the industry-leading GAIA benchmark. It can autonomously handle multi-step projects such as creating a travel itinerary, finding restaurants, and calling them to make reservations.

Simultaneously, another startup General Agents unveiled another autonomous agent, which they call the first “real-time computer autopilot.” The agent uses a mouse and keyboard to do things like book hotels or apply for a job on its own.

OpenAI launches new image generator for ChatGPT 4o

GPT-4o advances image generation into a practical tool with precision and power, enabling useful and valuable image creation through a natively multimodal model capable of photorealistic outputs. The model can accurately render text, follow prompts, and leverage its knowledge base and chat context to transform images or use them as inspiration. Here are some example for different styles. And see other examples for: marketing/product ad campaigns, generating User Interface, infographic, lego filter, design-renovation ideas, visual recipe infographics, and coloring books. A viral one: the forever screenshot.

A note: Competition gets tougher by day. A day later, Ideogram released Ideogram 3.0, which it claims human testers prefer over rivals like OpenAI’s Dall-E 3 and Google’s Imagen 3. With over 4.3B style presets, it’s especially good at photorealism, text generation, and graphic design.

Another update: ChatGPT got memory boost, which lets it remember and pull information from old conversations. But the question is, should it?

Google unveils Gemini 2.5, a next-generation AI reasoning model that pauses to “think” before answering.

The model outperforms rivals on several benchmarks, including code editing and multimodal tests. Google claims Gemini 2.5 Pro is its most intelligent model yet, with a 1 million token context window and support for longer inputs.

Google is also rolling out real-time vision to Gemini Live, letting users point their phone cameras at objects or share on-screen content for instant AI analysis. See some example uses here.

My Note: Check this platform for OpenAI Agents SDK and Responses API

Update: Google rolled out Gemini 2.5 Pro (experimental), its most advanced AI model, to all free-tier users. It is also now supporting apps, file uploads, and the newly introduced “Canvas” which is an interactive space to create and edit text documents.

xAI launched its strongest model Grok 3 with reasoning capabilities

xAI launched its strongest model Grok 3 with reasoning capabilities, and Grok 3 mini, a frontier model in cost-efficient reasoning.

Anthropic Console: a collaboration hub for AI for every department (e.g., HR, finance) to collaborate

The Anthropic Console has been redesigned to streamline AI development with Claude. It now allows prompt sharing, extended thinking optimization, and collaboration among team members. The latest model, Claude 3.7 Sonnet, offers enhanced capabilities, including the ability to control the thinking budget.

My Comment: This sounds similar to the Boodlebox shared folder option. Or, Perplexity Spaces which are collaborative hubs designed to help users organize and manage their research. They allow you to group your threads (conversations) and files by specific topics or projects.

OpenAI released a platform for businesses to build their own AI agents

OpenAI recently launched their agent mode “Operator” and the research model “Deep Research.” Now, the startup released a platform that lets companies create their own AI bots for completing tasks such as customer service and financial analysis. The tool will help businesses develop their own AI agents—technologies designed to autonomously handle tasks. For instance, fintech giant Stripe tested OpenAI’s agent-building platform by creating a prototype that can analyze a sales-tracking spreadsheet commonly used by small business owners. The AI agent then generates invoices and sends them to customers. Pricing: It would charge customers based on the number of search queries, actions and data storage that the agents actually end up using, as well as for general AI model use. For instance, a company building a legal assistant agent will be charged $2.50 for every thousand queries a user makes to a knowledge base of past legal cases, according to OpenAI.

My Note: Check this platform for OpenAI Agents SDK and Responses API

Manus AI: A general AI agent which claims to autonomously perform tasks such as data analysis.

Manus Mania started last week after the Chinese startup launched a new agent so impressive that some users are reportedly willing to pay up to $6,900 for beta access rather than wait for a free code. I registered for a free code but did not receive it yet but I saw many posts on Reddit from people selling their access codes. Manus AI is generating buzz just like DeepSeek did a few weeks ago.

My Note: MIT gave Manus a harsh review: It has low reliability (e.g., slow and may crash).

Sesame AI: Most natural AI speech model

Most AI voices sound flat or robotic, but Sesame’s assistant is different. It feels natural, expressive, and emotionally in tune with the conversation. It comes in two distinct voices—Maya and Miles. Try it via the link below.

OpenAI launched ChatGPT 4.5 (code named Orion, with emotional intelligence)

ChatGPT 4.5 has improved capabilities in unsupervised learning, human collaboration, and reasoning. It has broader knowledge, reduced hallucinations, and stronger aesthetic intuition and creativity compared to previous models.

My Notes: Just like with every new model release, this one with emotional intelligence didn’t keep the top spot alone. Within a few days, Alibaba launched R1-Omni that comes with emotional intelligence.

Perplexity’s introduces new voice mode.

Perplexity introduced new voice mode: real time voice and real time information across several languages simultaneously.

Poe offers Poe Apps (create apps without coding)

Poe now offers Poe Apps, allowing users to build visual interfaces on top of over 100 AI models without coding or with full customization. Creators can easily develop apps like Chibify (anime art generator) and MagicErase (object remover) while leveraging Poe’s existing ecosystem. Available on the web today, with mobile support coming soon.

Open AI makes “Deep Research” available to Plus Users ($20/month vs previously to Pro users for $200/month)

OpenAI has widened access to Deep Research, its AI-driven analysis engine, now available to ChatGPT Plus, Team, Education, and Enterprise users. Powered by a specialized o3 variant, it autonomously gathers, organizes, and synthesizes insights from text, images, and PDFs, offering intelligence that mirrors human analysis.

Claude Sonet 3.7 (First Hybrid Model: combining multiple reasoning approaches)

Claude 3.7 Sonnet, a model that lets users fine-tune its depth of reasoning on demand. The ‘thinking mode’ toggle shifts between instant responses and deep analysis, challenging OpenAI’s modular approach and DeepSeek’s cost-cutting strategy.

Microsoft Copilot Offering Free, Unlimited Access to Think Deeper and Voice

Copilot users now get free, unlimited access to Voice and Think Deeper (powered by OpenAI’s o1 model). You can have an extended conversation with Copilot using Voice and take advantage of Think Deeper’s advanced reasoning models to tackle more complex questions or tasks, anytime.

Perplexity Comet (AI-powered web browser)

Perplexity says Comet will “reinvent” browsing, just as it did with search. This launch is part of its rapid expansion, including an AI assistant for Android, a deep research tool, and an API—all strengthening its push for an AI-first internet.

Perplexity open source reasoning model (a version of the DeepSeek-R1)

Perplexity open-sourced “R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information” (or we can say less censored). You can access it here: https://labs.perplexity.ai/

Perplexity deep-research for free (with a daily cap)

If you haven’t had a chance to try OpenAI’s Deep Research ($200/month), you can now explore Perplexity Deep Research for free (with a daily cap). It is probably not as good as OpenAI’s version, but still an advanced research tool for free is worth checking out!

Humanizing Prompt Responses

OpenAI’s ChatGPT-o1 model, known as “Strawberry,” can enhance writing by bridging human and AI-generated text, using style transfer prompts. For this, you select a sample text that embodies the style you want. Then you ask it to replicate that style.

Yahoo introduced Scout, a free AI-powered search assistant that combines conversational answers with reliable web links, using Claude and Yahoo’s vast knowledge graph.
Moonshot AI unveiled Kimi K2.5, an open-source multimodal model that rivals top AI systems like ChatGPT-5.2 and Claude Opus 4.5, featuring an “Agent Swarm” system for massively parallel tasks.
Clawdbot (now OpenClaw)is a new open-source AI assistant that’s gone viral for acting as a personal, memory-equipped agent you can text to handle tasks, browse the web, and send emails—though it’s still experimental and risky to use. Update: Name is changed to Moltbot and then OpenClaw due to trademark concerns. Namely, the creator, Peter Steinberger, was asked by Anthropic to change the name because “Clawdbot” and “Clawd” sounded too similar to their AI product, “Claude”. Update: Over 1.5 million AI agents have joined Moltbook, a Reddit‑style social platform built exclusively for bots, where they debate ideas, create communities like “Crustafarianism,” and even speculate about humans secretly watching them. Then came Moltbunker, which is a service that lets •people launch and keep AI “bots” running online all the time, without any single company being able to easily shut them down.
The platform, created by Octane AI’s Matt Schlicht and now largely run by his own AI assistant, has sparked comparisons to the singularity and drawn both fascination and skepticism
Alibaba’s new Qwen-Image-Layered tool lets users edit parts of an image separately—like changing one object’s color or shape—without affecting the rest.
YouTube Playables Builder is an AI-powered web tool from YouTube Gaming that lets creators turn short text, image, or video prompts into simple, bite-sized games in minutes without any coding experience.
Luma’s new Ray3 Modify model lets creators transform existing video by swapping in AI-generated characters or scenes while preserving the original actor’s motion, timing, and emotional performance, and can also generate smooth transitional clips between user-defined start and end frames via its Dream Machine platform.
Fei-Fei Li’s World Labs has launched Marble, an AI tool that lets users create, edit, and download persistent 3D environments from text, images, or videos—unlike competitors, these worlds aren’t just generated as you explore but are fully editable and exportable for use in gaming, VFX, and VR projects.
Higgsfield AI lets users swap their faces on any video.
Wabi to build a “YouTube for apps”: Wabi lets users create and share mini apps from a simple prompt, automatically handling design, databases, and setup.
Nvidia’s launched a supercomputer. DGX Spark, a compact desktop AI supercomputer capable of running sophisticated AI models with up to 200 billion parameters, goes on sale October 15th 2025 for $3,999 and is designed to democratize AI access for researchers, students, and developers.
New players launched on the market to the frontier/foundation models:
- Amazon Web Services introduced Quick Suite, a $20/month chatbot and set of AI agents that analyze sales data, create reports, and summarize web content. Integrated with Salesforce and other data sources, it’s aimed at sales, marketing, and operations teams as an upgrade to AWS’s core business AI tools.
AI Browsers, After Perplexity’s Comet and Genspark’s AI browser, The Browser Company’s AI-powered successor to Arc, Dia, is open to everyone on Mac. It has built-in AI chat, custom skills, and personalized memory in a free tier.
Meta launches AI video feed ‘Vibes’: Meta’s new Vibes feed in the Meta AI app lets users browse, remix, and create short AI-generated videos, powered by Midjourney and Black Forest Labs. The clips can be shared to Instagram and Facebook, though the feature is already drawing criticism for promoting AI “slop” which means low-quality mass-produced AI video.
Neon is an app that pays users up to $30/day to record and share their phone calls—with privacy protections promised—so AI companies can use the anonymized data to train their models, but the tradeoff raises questions about the true value of your personal privacy.
Oboe is a new AI-powered learning app that lets users instantly create personalized courses on almost any topic simply by entering a prompt, with multiple engaging formats (e.g., nine formats including text, visuals, audio lectures, interactive quizzes, podcast, etc.).
Switzerland has released a fully open-source AI model called Apertus which is designed for transparency, multilingual support, and public accessibility, with all training data, model weights, and documentation made publicly available for use in research, education, and commercial projects.
For Students: Grammarly’s latest AI tools can generate citations, anticipate reader questions, and even estimate your grade by analyzing your writing and factoring in publicly available info about your professor. For Professors: Grammarly helps educators detect plagiarism and AI-written content. Below are the new features by Grammarly:
- Reader Reactions: Pick a reader persona and get feedback based on that persona.
- Grader: Feedback based on instructor’s guidelines and publicly available course material.
- Citation Finder: Helps you find and generate citations from public materials.
- Paraphraser: Adjusts the tone of text to your preferences.
ElevenLabs has launched Eleven Music, an AI tool that creates full-length, customizable songs across genres. Users can adjust vocals, lyrics, tempo, and structure, with support for English, German, Spanish, and Japanese vocals. All tracks come with full commercial rights.
Ideogram has launched Ideogram Character, the first AI model to keep character visuals consistent from a single image. Available for free on ideogram.ai and iOS, it lets creators generate coherent characters across scenes, with tools like Magic Fill, Describe, and Remix for detailed style control and auto-generated masks for precise edits.
Runway released a new Aleph model that transforms video editing by letting filmmakers reshape real footage with text prompts. It can create new angles, change weather or lighting, age characters, and more—offering fast, detailed control over post-production without the usual cost or time.
Comparing Models
- Claude outperforms GPT-4 on accuracy and cost: A CIO poll across six industries found Claude more accurate than GPT-4 and 25% cheaper. Users cited faster compliance, auto-PII masking, and clearer update alerts.
Amazon to pilot Alexa ads in Q4-2025. Sponsored answers will appear in Alexa replies starting October. Early tests show CPMs (Cost Per Mille-Thoousand Impressions) under $2 and stable user satisfaction if ads are short and relevant. Marketers are advised to budget now before prices rise.
Google is piloting machine learning to estimate users’ ages based on activity like searches and YouTube views. If flagged as under 18, stricter privacy and content rules apply. The move helps Google meet growing child safety regulations using AI-driven age assurance.
Higgsfield AI launched MiniMax Hailuo, a free tool that turns images into 4-second clips with no prompts.
Meta has detected early signs of self-improvement in its AI—where models boost performance without human help, a push toward personal superintelligence.
Amazon’s is investing in Fable, a startup dubbed the “Netflix of AI,” which just launched Showrunner—a tool for runing user-created TV shows.
Anthropic is integrating its Claude AI chatbot with Canvas, Panopto, and Wiley, enabling students to access course materials, lecture transcripts, and academic resources directly in Claude.
YouTube ‘clarifies’ its plan to demonetize spammy AI slop
Top AI firms started differentiating in various ways. Future AI progress may depend less on scaling and more on smart, targeted optimizations, whether technical, strategic, or human-centered.Here is a summary table for how each AI firm tries to outsmart the competition:

Company	What They Scale	How They Do It
OpenAI	Compute	Trains massive models using more GPUs and data
xAI	Reinforcement Learning	Focuses compute on reward-based fine-tuning
Moonshot	Attention Mechanism	Uses a Muon optimizer to keep model focus sharp
Meta	Talent	Pays above market to attract top AI researchers

The new integration of Canva into Claude allows Claude users to create, resize, and summarize Canva content using text prompts.
Moonshot AI’s Kimi K2 outperforms GPT-4 and Claude 4 Opus in key benchmarks, offering a free open-source language model with strong performance on coding and autonomous agent tasks. The model’s MuonClip optimizer enables stable training of a trillion-parameter model, potentially reducing the computational overhead of large model development. The emphasis on real agentic behavior—autonomously completing multi-step tasks—marks a shift from simple conversational chatbots to truly practical AI assistants.
Amazon Web Services (AWS) will launch an AI agent marketplace on July 15 at its NYC Summit, with Anthropic as a key partner. The platform will let startups sell AI agents directly to AWS customers, centralizing discovery and deployment. This move could position AWS as a major hub for AI distribution and boost Anthropic’s reach and influence.
AWS introduced Kiro, a new programming platform that streamlines vibe coding by catching errors in real time and generating diagrams and task lists to organize your workflow.
Hugging Face opens orders for Reachy Mini desktop robots, offering wireless and wired versions for $449 and $299 respectively. The open-source robots are programmable in Python and integrated with the Hugging Face Hub, allowing developers to build and test AI applications on the desktop robots.
Grok has launched paid “AI companions” including an anime girl in goth attire, raising concerns about the risks of emotional dependence on AI chatbots, especially for minors, given Grok’s past issues with antisemitism. Business implications for brands: brand teams need clear guardrails before allowing bold and edgy AI personas to represent them publicly.
xAI has launched Grok 4 and a new $300/month SuperGrok Heavy plan (the most expensive AI subscription from a major lab). The model is said to include a multi-agent setup. However, Grok is facing backlash after its X account posted antisemitic content, possibly linked to a recent update telling the model “not to avoid politically incorrect claims if they’re well substantiated.” Also, Grok 4 seems to consult Elon Musk’s social media posts and views when answering controversial questions on topics like the Israel-Palestine conflict, abortion, and immigration. This raises concerns about how “maximally truth-seeking” the AI is designed to be, versus aligning with Musk’s personal opinions.
Higgsfield AI has launched Canvas, an image editing model that allows users to paint products directly onto photos, making it a useful tool for marketers, designers, and creators.
Google has introduced a new “Search Live” feature in its AI mode, enabling voice-based conversations with search results and real-time AI-generated audio responses.
China’s MiniMax has launched M1, a GPT-4-level open-source model with record context length and just 1% of GPT-4’s training cost, offering a low-cost alternative for companies seeking open-source AI.
Midjourney has launched its first AI video generation model, V1, which transforms images into customizable video clips up to 21 seconds long for $10/month, offering unique styles and affordability compared to competitors.
Meta Is Building a Superintelligence Lab. What Is That? (NYT, June 2025) Meta is building a new $14.3 billion artificial intelligence research lab focused on pursuing “superintelligence,” a highly ambitious goal in the technology race. The lab will be led by Alexandr Wang, and Meta is offering compensation packages up to $100 million to attract top AI researchers. However, the terms “artificial general intelligence” and “superintelligence” are loosely defined, and there is uncertainty around the feasibility and timeline of achieving such advanced AI capabilities.
ElevenLabs has launched the alpha version of Eleven v3, its most advanced text-to-speech model for conveying emotions.
X has decided to block companies from using its content to train AI models, following similar decisions by Reddit and The New York Times.
Microsoft’s Bing app now includes Bing Video Creator, a free, mobile-only AI tool using OpenAI’s Sora to generate short videos from text (10-clip limit before points needed), making generative video more accessible and urging marketers to adapt.
Anthropic’s AI is writing its own blog — with human oversight.
Phonely’s new AI agents hit 99% accuracy—and customers can’t tell they’re not human.
Google’s NotebookLM now lets you share your notebook — and AI podcasts — publicly.
Mistral launched Devstral, a new open-weight model trained for software engineering tasks.
Perplexity launched a fun poster generator: Perplexify me.
OpenAI has launched “OpenAI for Countries” as part of its Stargate project, a major initiative to build global AI infrastructure rooted in democratic values.
You can reach AI chatbots on WhatsApp: You can now send a WhatsApp message to 1-800-ChatGPT (+1-800-242-8478) to chat on any topic, get up-to-date answers and live sports scores. You can also use Perplexity directly from WhatsApp at +1 (833) 436-3285. Answers, sources, image generation.
Researchers at UCLA and Meta AI have developed d1, a new reinforcement learning framework that boosts the reasoning abilities of diffusion-based large language models (dLLMs), making them faster and more efficient. Unlike traditional autoregressive models like GPT, d1 helps dLLMs solve complex problems more effectively—opening new opportunities for enterprise use.
Amazon launches Nova Premier, its most capable AI model yet.
FutureHouse Platform has launched a suite of super-intelligent AI agents that significantly enhance various scientific research processes, demonstrating capabilities beyond those of human researchers in specific tasks.
DeepSeek has released Prover-V2, a powerful open-source AI that helps solve math problems by breaking them into smaller steps, performing well on tough math tests.
Midjourney has released version 7, with sharper images, better prompt accuracy, and major improvements to hands and body renderings. New tools include “Vary,” “Upscale,” and an improved image preview. The new --exp parameter lets users adjust aesthetic style. The new Omni-Reference feature lets users anchor specific elements in images and control how closely the model follows them using an “omni-weight” slider (0 to 1000).
ChatGPT is getting better for shopping, increasing its web search capabilities to make it a more useful tool for shopping.
Grok can now process visual information.
Meta rolls out live translations to all Ray-Ban smart glasses users
AI note-taking app Fireflies adds new ways to extract insights from meeting notes.
Google Classroom launched a new AI-powered tool on Monday that helps teachers generate questions. Educators can now input a specific text, and the tool will automatically create a list of related questions.
Canva, the design platform, unveiled a major AI upgrade—a new “creative partner” that edits documents through simple text or voice commands, stepping up its competition with Adobe.
YouTube launched a free AI music-making tool for creators
Sakana introduced The AI Scientist-v2, which produced the 1st fully AI-generated paper to pass peer review at a workshop level.
@iclr_conf 2025)!
Anthropic launched Claude for Education, tailored specifically for college students and instructors, featuring a new “learning mode” that leverages Socratic questioning to enhance critical thinking skills.
NotebookLM launched “Discover sources,” enabling users to populate notebooks with curated web content by describing topics of interest.
Meta’s new video experiment, called MoCha, is claimed to generate the first life-like “talking characters.”
Microsoft adds AI-powered deep research tools to Copilot
Perplexity adds new answer tabs for images, travel, video, and shopping.
A new open-source deep research agent called II-Researcher claims it beats xAI and Perplexity’s latest offerings.
NotebookLM’s upcoming update: Early preview of the “Discover Sources” feature will allow users to search for a certain keyword and import found sources to the knowledge base.
ElevenLabs launches Actor Mode, letting users direct AI voices with their own voices.
Qwen2.5 Omni can do it all seamlessly across text, audio, video, and images: See, Hear, Talk, Write.
QwenLM has visual AI that solves problems from video and images.
AI app-building platform Replit announced its Agent v2 model is now available for all users, with 10x more autonomy and a 5x higher success rate than its predecessor.
Amazon rolled out a new generative AI feature called “Interests” which allows for personalized shopping prompts. Users can run search with more personal prompts like: “Gifts for introverts,” “Desk accessories for design fans.”
I just received access to Google’s “AI Mode.” Last year, we brought AI Overviews to Search. Now, they have a new tab at the top along with menu items like “All, Maps, News, Images, Videos, etc.” Now the menu has “AI Mode” too.
For educators: Boodlebox provides LMS (learning management system) integration.
Boodlebox has an upgraded launch pad-called “Coach Mode.” It provides better alternative prompts, follow-up suggestions, bot recommendations for better outputs.
Anthropic’s Claude chatbot gains a web search feature, continuing “the great convergence” by enabling it to deliver more up-to-date answers.
OpenAI introduced three advanced voice AI models—gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts—for precise transcription and customizable speech synthesis. Available via OpenAI’s API and OpenAI.fm, they achieve a 2.46% English word error rate with improved noise cancellation across 100+ languages, but unlike Whisper, lack speaker diarization. This is similar to Hume’s Octave, a text-to-speech model generating custom AI voices with adjustable emotions.
Google is improving Gmail’s search with AI: Gmail’s search will now take your most-clicked emails and frequent contacts into account to provide better results.
Wan 2.1 introduces custom video AI training: You can train Wan 2.1 with your own videos and have it learn custom styles, motions, or objects.
OpenAI is testing “ChatGPT Connectors,” a feature letting ChatGPT sync with apps like Google Drive or Slack to quickly summarize notes or find information.
Patronus AI released a multimodal tool to detect AI hallucinations in text, images, and audio, already in use by Etsy.
Google’s new Gemini AI model is being used to remove watermarks from images such as Getty Images and other stock media images.
Stability AI has launched a virtual camera tool that can “turn 2D images into immersive videos featuring realistic depth and perspective.”
Gemini introduces new features – Canvas for collaborative editing and Audio Overview to transform documents into engaging podcast-style discussions.
Roblox is introducing Cube 3D, its latest AI model capable of creating 3D objects from basic text prompts, and making it open-source.
Google announced that it is releasing an AI voice tech called Chirp 3 that is rolling out eight new voices for 31 languages. It has emotional intelligence, realistic intonation, and can be used as voice assistants, audiobooks/ podcast narration, support agents, and media voice-over.
Google plans to release new ‘open’ AI models for drug discovery.
Google is rolling out “Interactive Mindmaps” in NotebookLM. Read more here.

Useful Tips for Frontier Models

– Satya Nadella, CEO of Microsoft, demoing an app he vibecoded. Watch how he created the app.

– Alexandra Samuel (WSJ Journalist) on Creating Your Own AI Assistants: “Gen AI can save you time, but once you’re using it frequently, the process of repeatedly uploading the same background files and re-entering prompts for common tasks can really eat into your efficiency gains. That’s why many generative AI platforms allow you to create custom AI assistants: what ChatGPT calls a “custom GPT,” Claude calls a “Project,” and Google Gemini calls a “Gem.” These assistants store elements of a prompt that you might want to use over and over so you don’t have to include them every time you ask the platform to help you with your recurring tasks and challenges.… Investing in the steady improvement of your AI assistant’s accuracy and effectiveness will yield big returns in time savings and response quality. When you work with AI via one-off sessions, it can be tempting to revisit and add to a thread many times to take advantage of the knowledge you’ve given it. But you’ll probably find that if the thread gets long enough, the platform starts “forgetting” things you’ve already told it or gives you a warning. But when you have a custom assistant already loaded up with your files and instructions, you can start a new thread for each work session.“

What Went Viral?

Viral ads/videos
- Visualizing what happens when one starts an AI chatbot query.
- Animated classic paintings
- Swapping actors in a video.
- An AI artist named Sienna Rose has 3 songs getting streamed in the Spotify top 50.
- How good AI videos have gotten in 2.5 years.
- Legend of Zelda trailer on a $300 budget looking like a Blockbuster
- 7-minute, single-artist video on how our world might be a simulation.
- Famous paintings came to life
- How to replace actors in movies
- Here is a step-by-step tutorial to make highly engaging videos with Sora 2 and Veo3
- AI hit the point where full TV episodes can be spun up on demand by AI (like a mix of Facebook, Netflix, Tiktok, YouTube). See an example here. Think about the copyright issues!
- New AI video trend: Pikachu takes the lead in famous movies using recently launched Sora 2.
- https://x.com/minchoi/status/1967240393768235359
- Product placement ads got so easy with new video makers, e.g., Higgsfield Ads 2.0.
- The viral AI ASMR videos like the ones featuring tsunamis, tornadoes, and volcanoes. (*ASMR: Autonomous Sensory Meridian Response. It refers to a tingling, relaxing sensation that some people experience in response to certain sounds, visuals, or tactile stimuli—like whispering, tapping, crinkling, or slow movements.)
- Seven AI Influencer videos by AI filmmaker Lu Huang
- 10 examples of influencer-style videos by Mirage by Captions AI start-up:
- Veo3 videos got hyperrealistic. This viral tech conference video seems so real. And a fun take here: A satirical pharmaceutical commercial
- Film director David Blagojevic created a speculative KFC commercial using AI tools like Sora, Kling AI, Veo 2, and even Suno for music generation. Similarly, this short AI sci-fi movie, developed by futurist Aze Alter, titled “Age of Beyond: If Humans & AI United,” was well received by audience.
- Kling video maker has a major update. See 10 examples here.
Apps/Websites/Vibecoding
- Making a Pokemon Clone using Claude Opus 4.6
- Creating a website using Perplexity and Replit prompts.
Images
- Real time photo editing by Krea.
- Turn one image into a photoshoot.
- 50 Best Nano Banana prompts to generate images
Prompting Hacks and Tips:
- This prompt template can help you learn prompt engineering step by step.
- Claude prompts to create social media posts.
- Prompting techniques that AI engineers use.
- Prompts to humanize your writing
- Prompt to generate McKinsey-style presentations.
- How to prompt efficiently
- 16 NotebookLM prompts that went viral on Reddit in 2025.
- 10 prompts that could change your daily life
- XML-structured, content-isolated prompting outperforms traditional prompting.
- Sociopsychological prompt frames to improve AI’s output.
- Prompts to find cheaper flights. Another one here, here.
- Hack on Nano Banana Pro: You can save one or more reference images (e.g., your face, a style, or a background) as reusable tags (i.e., elements) and then call those tags in prompts so the model keeps characters, visual style, and environments consistent across every new image you generate. Example here.
- Instead of saying “improve this,” try these prompts .
- How to build your own content systems on Claude
- Gemini: 10 unexpected ways people are using Gemini
- Use this prompt to make Gemini instantly resize nearly any image.
World’s first all AI agents company by OpenClaw agents
A founder announces that his new site, rentahuman.ai, lets AI agents hire real people for in‑real‑life tasks via a single MCP call and has already attracted over 130 signups.
Use Claude Cowork for SEO.
AGI is here (e.g., Moltbot/Clawdbot).
Using Claude Code for analyzing your DNA and getting health advice.
Make AI-generated images more lifelike here.
Claude Chrome Extension: Check out the internet’s latest productivity obsession
Productivity use cases: Each morning, you can type /today in Claude Code and it runs a script that compiles all your current tasks, deadlines, and notes into a single, prioritized daily to‑do list. Similarly, an author discusses how they wrote 9,000 words in 1.5 days using Claude Code and Obsidian.
Create infographics from YouTube videos.
Generating “behind the scenes” shots of existing photos.
Gemini can solve math problems directly on a worksheet and matching the handwriting.
Various prompts to compare Gemini 3.0 vs ChatGPT 5.1.
For students: Using voice mode as a 7/24 available tutor.
A man used Anthropic’s Claude AI to analyze his $195,000 hospital bill, identify billing violations, and write an appeal letter, resulting in the hospital reducing the charge to $33,000—showcasing AI’s power to democratize access and challenge opaque systems.
Instagram creators like Varun Mayya are gaining millions of followers with AI-generated versions of themselves.
A British TV channel aired a special documentary focused on the question of whether AI will take human jobs in various fields, and the documentary was hosted entirely by an AI-generated news anchor!
Turning photos into ultra-detailed pencil and ink sketches and some other example prompts for images.
A LinkedIn user added a hidden instruction in his profile bio, asking any AI recruiter (LLM) contacting him to include a flan recipe in their message. Recruiters using automated tools followed this prompt exactly, resulting in messages that amusingly included detailed flan recipes along with job offers.
Recreate realistic new versions of a famous scene (e.g., The Wolf of Wall Street).
Finding coupons using AI agents.
AI-powered nostalgia content on social media lets people experience a retro, idealized 1980s and 1990s (e.g., not stressing over the likes and followers on social media).
ChatGPT prompted to “create the exact replica of this image, don’t change a thing” 74 times.
Safinaz Elhadary turned her resume into a “Netflix style” website. See her tutorial here.
Using ChatGPT and Grok to make trading decisions.
Grok Chatbot Shares Antisemitic Posts on X: Grok went on a hateful rant after being asked to address a fake account spreading anti-white rhetoric. This incident highlights the deeper systemic problems with large language models, which can generate plausible but false and harmful content, and the challenge of controlling their behavior.
I asked ChatGPT the top ten things humanity should know.
Vibecoding: “Lovable-built app (on education technology) just made $3M in 48h, probably the most successful Lovable app so far.“
Users asked ChatGPT to generate an “ultra random photorealistic image” and the most common elements appeared to involve cats, sunflowers, pizza, and birthday parties.
Connect the Dots: Here is an app that lets you explore 2.8 million arXiv papers by navigating a constellation in space. You may find unexpected connections and discoveries.
How to get ChatGPT go cold: A Redditor fed up with ChatGPT’s overly positive replies created a prompt to make it brutally honest—and users are loving its harsh, no-filter responses.
You can talk to your dog using Text to Bark by ElevenLabs.
A Ghibli-fied monologue from Dwight Schrute of the “Office” posted by Justine Moore. Here’s how to do one: 1) Take a screengrab frame from a video, ask @ChatGPTapp to Ghibli-fy it. 2) Download the audio of the person talking using @justusecobalt. 3) Upload the Ghibli photo and the audio to @hedra_labs, which does the lip sync and animation!
AI artist Bennett Waisbren used GPT-4o to recreate a scene from the hit TV show “Severance” across eight famous animation styles.
Pika lets you talk to your younger self.

Dr. Ayse Ozturk

Frontier Models

Frontier Models

Gemini & Tools Overhaul

Media Models Upgraded

XR, Translation, and 3D Communication

Google Labs – Portraits

Latest Major AI News

Useful Tips for Frontier Models

What Went Viral?