The battle between Open AI and Google saw its latest episode this past fortnight, with both announcing new capabilities and initiatives. Multi-modal? Check. Video generation? Check. Supportive agents? Check. We are very much in the Messi-Ronaldo or Federer-Nadal era of AI. The good news is that as with any such hyper-competitive era, we the consumers get to enjoy the benefits.
Broadly the race is along 3 key axis.
Multimodal: The first is multimodal - both Google and Open AI have made strong plays here. You can see Google’s Veo project here, similar to Open AI’s Sora. Both of these generate video based on text inputs. But to be truly multimodel you need to enable inputs which are text, audio, video, code, and generate output which can be any of these as well. Which is where Open AI has launched to the public GPT 4o (o = omnichannel) which can reason across text, audio, and video, in real time.
Agents: The Second is helpful agents: turning the rich capabilities into helpful agents, that can actually perform tasks. Capabilities don’t always translate to effective help, because simple human tasks can be weirdly complex for automated agents. One of the advantages of the broad intelligence of AI tools is the ability to execute these tasks - such as ‘call the help centre for xyz company and log a complaint about the printer not working’ - this can suck an entire hour from your day, but a voice based Ai could execute it as demonstrated by Open AI, or Google’s project Astra.
Performance: The third axis is the less visible one - but arguably the most critical one - the under the hood engineering and performance management. Google’s milestone was the announcement that Gemini pro will have a context window of 2m tokens. Which essential means that Gemini will have a much higher ability to absorb and retain the background information of your conversation and better remember what you said over the course of your conversation.
Adoption As A Differentiator
I personally find a few challenges that we still need to overcome. Here are 3:
Brand Confusion: I’ve completely lost track of all the products, their names, their sub brands and features, and which does what. I’ve read about Gemini, Astra, Veo, Imagen, Sora, ChatGPT, GPT4o, Dall-e - and that’s just from the 2 major players. And I’m not sure exactly which does what, which ones are just research projects, which ones are announced but not released. Or available only in the US. Which ones are going to get folded back into other products. Some of this is academic, but the truth is that we as consumers, like to invest into a relationship with the brands we favour. I don’t know which of these I should pin to my task bar, which ones I’m going to count on using this time next year. Which ones are just here to tickle my curiosity. And which one is just a marketing gimmick. You get the gist.
Learning: It’s going to take us time to learn to use these tools well. To ask the right questions, to trust them with our information. To be able to use the outcome well. To reshape our lives in small and big ways so these become a part of our daily routine. And given that some of this involves a change in behaviour, there’s an element of awkwardness. Even when I’m alone I struggle to ‘talk’ to my phone or to ChatGPT.
Choices: some of this is like the genie that appears and offers us 3 wishes. It would take me days, if not weeks to figure out exactly what those 3 wishes should be, in order to avoid buyers’ remorse. In much the same way, we’re faced with these wondrous toys and we might feel a bit overwhelmed of where to start? What part of our lives should we look to improve first?
None of these are very difficult to overcome. But the race might in the end be won not by the competitor who has the better product, but the player who makes adoption easier. And this is where I think Google is suffering from portfolio complexity. Open AI’s products are more easily usable, the releases are unequivocal, their videos are less staged, and the portfolio is simpler to use.
What Does This Mean for Human Assistants?
Every step of progress in AI capabilities seems to cast a shadow on an additional layer of jobs and a few million people feel threatened by the capabiltiies of AI and their long term prospects are questioned. The rise of agents in the AI world will definitely impact a whole range of assistants, and entry level jobs.
For this discussion, we probably want to think of two different kinds of assistant roles. The first kind is the career assistant - who is likely to spend most or all of their career in assistant or secretarial roles. The second kind is the assistant as a learning role. The kind of apprentice roles that exist in many industries. Like runners in the media business, or paralegals in law firms.
The first kind - the career assistants’ role is definitely under threat. It could diminish the role, it might end up being a fusion of person and AI. The number of such roles will definitely be less. And these assistants will need to acquire new skills. Perhaps including how to work effectively with and coordinate a swarm of AI Agents.
The second kind, the learner, is more complex. Companies will need to ensure that in their need to automate and streamline, they don’t undermine their own future by eradicating a learning path for future managers and experts. Perhaps these roles will become more curated. They will be defined with more thought, rather than just have people getting thrown into the deep end and learning to survive. Perhaps AI agents will be used as buddies to help junior people learn faster, and grow in more specific areas. I see a scenario where the learning is accelerated so growth is faster someone could achieve a career milestone in terms of designation, seniority, or expertise much faster than they are expected to, today. But as you can imagine there’ll be less of them too.
We’ve looked at assistants here, but you can probably consider a much wider range of roles through this lens.
What Does This Mean for Humans Overall?
We place a lot of importance on jobs, careers, the dignity of labour, the importance of work and career as a part of our identities. Complex social and economic structures are built around the idea of work, professions, corporations. Urban culture, the idea of work-life balance, the social structures between white collar and blue collar jobs, and even much of our art and entertainment revolves around the idea of work as we know it. But all of this is a creation of the industrial era.
The industrial revolution created the giant machine that is industrial work. In this machine, human beings for the largest part, are cogs and required for a tiny percentage of their brain power. We could argue that the industrial revolution turned humans into robots long before technology came along to do those robotic jobs better. Production lines, administrative work, the quintessential back office worker, the cashier, the data entry job, payroll processing, the file clerk, and the transcriptionist. In fact when you break it down, a lot of middle management jobs in traditional businesses also involves only a fraction of the brainpower people possess.
Is it a bad thing therefore that we don’t use humans for these jobs? Wouldn’t we all agree that freeing humans to do more worthwhile jobs and activities is good for their wellbeing?
Ah but the problem is that people still need to sustain themselves, so if they’re not doing these jobs how will they earn money, and survive? And as fundamentally, what will they find meaning in, if not productive work? Both of these are questions that are harder to answer.
Economics: For the first one, consider this economic perspective of human labour. At the start of the industrial revolution, the worlds population was about 1 billion. But as the demand for humans grew, so did the supply. Along with the decline in wars, improvement in healthcare, and life expectancy, the supply of humans soared to meet the demand. As the future demand for humans drops, in a post industrial AI dominated world, could the supply also decline? This would amount to a macro-economic solution to the loss of job. It certainly wouldn’t help the individuals who found themselves replaced by an AI agent. There is a view that AI, like all previous technologies will create more new jobs than it removes, but we must consider the scenario where this not the case, because AI gets better fast enough to not need the additional people for the new roles. Of course in all these scenarios, the picture may be bleak for some individuals in the short term. Which is why countries are trialling universal basic income.
Meaning: The second one also is easier if you consider the big picture rather than individual examples. Human beings can throw themselves into a range of worthwhile activities - art, music, exploration, discovery, science, community work, caring for each other, restoring the planet, space travel, and much more. But of course, none of those may be appealing options to the 55 year old man who has spent all his life identifying himself with his coding job. This is a bit like taking a captive animal and releasing them in the wild. The animal may not survive. But if you released every captive animal, it might see the species flourish over time.
So the big picture may make sense but there are certainly scenarios where the short term is less than rosy for some people caught in the crosshair of rampaging AI efficiency and capability.
More AI Reading
Nvidia still rules the roost, but many start ups are trying to be challengers by working a generation ahead and betting on the future. Cerebras, Groq, Matx to name a few. So are Google, Open AI, AMD and others. The attack is largely on the lines of eliminating the inefficiencies caused by the unintended fit between GPUs and AI requirements. (The Economist)
AI Prosthetics: Sarah lost her arm in a terrible accident but a robotic prosthetic arm powered by AI has allowed her the use of a right arm that allows her to make coffee or straighten her hair. The arm, made by Covvi, uses AI software that learns her movements so the more she uses it, the better it gets. (NYT)
AI Industry: in the slipstream of AI, a lot of sectors and companies are making their fortunes. The demand for AI has powered nVidia to the pinnacle, but behind the demand for chips lies the demand for materials, energy, and supply chains. This is another mini industrial revolution in play. (WSJ)
Other Reading
James Web Telescope sees black holes merging at the dawn of time. (LiveScience)
Screenless Cities: I’m not quite sure whether I agree with all the arguments here but it is definitely true that we are increasingly conditioned to see the world through screens, and reducing this kind of tunnel visioning may not be a bad thing. (Medium)
DeBeers at a crossroads. Artificial diamonds are making inroads into the market. The company’s owners want to offload the business. Diamonds might be for ever but what about De Beers? (The Economist)
Thanks for reading and see you soon!