Devin's Demo As The "First AI Software Engineer" Was Faked

Today, I experienced a sense of betrayal in my role as a technical writer focusing on generative AI. It is my duty to provide accurate and up-to-date information to my readers, both technical and non-technical, about the latest developments in the rapidly evolving field of artificial intelligence.

Last month, I was excited to write about Devin, the first AI software engineer developed by a Bay Area startup called Cognition. Among its many impressive features, the one that caught my attention was its claimed ability to complete tasks on Upwork. Impressed by this, I shared my thoughts in an article titled "The First AI Software Engineer Is Here".

However, today, a YouTuber named Karl analyzed the demo video and debunked the claim that Devin could complete and get paid for freelance jobs on Upwork.

In his video, Karl examined the claims made about Devin, particularly the one about viewers being able to watch Devin handle complex tasks on Upwork to earn money. He deconstructed this claim by dissecting the tasks Devin was shown performing in the promotional video, revealing discrepancies and exaggerations in its capabilities.

Karl demonstrated that Devin did not deliver what was specifically requested in the Upwork tasks, and instead performed simpler, unrelated tasks that were falsely represented as significant achievements.

This misrepresentation has led many non-technical, and even some technical people, to believe that AI might soon replace programmers, which is not an accurate reflection of the current state of AI technology.

In addition to the fake demo of Devin doing an actual Upwork job, Karl also pointed out mistakes in the coding examples shown in the video:

The files Devin was shown editing in a GitHub repo don't actually exist in that repo, and some of the errors it fixed were nonsensical and unlikely to be made by a human developer.
Devin's code changes were suboptimal, such as writing a low-level file read loop instead of using the standard library properly.
Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.
Devin does nonsensical shell commands like `head -n 5 foo | tail -n 5`

Medium writer Devansh also conducted his own investigation of the video and shared his findings in an article titled "Did the makers of Devin AI lie about their capabilities?".

As a technical writer who was fooled by the misleading demo, I take this news personally. It is disheartening to realize that I unintentionally contributed to the spread of misinformation, which can harm public understanding of AI's current capabilities and lead to a distorted narrative about AI replacing human jobs.

Given the amateur mistakes made by Devin, it seems unlikely that it is using top AI models like GPT-4, Claude Opus, or Google Gemini. This raises questions about the technical competence and honesty of Cognition.

I am curious to see how the Devin team will respond, if at all. Will they acknowledge the discrepancies and apologize for the misleading demo, or will they double down on their claims?

What is AI Washing?

AI washing, a term coined by research firm Cognilytica, refers to the practice of companies overhyping their product's AI capabilities to attract investment and mislead consumers. This phenomenon is a growing concern in the industry.

Remember Google’s demo video of Gemini?

Google's demo video of Gemini, which was meant to highlight its multimodal capabilities, was another example of AI washing. The company later admitted that the demo was created by capturing footage to test Gemini's capabilities and prompting it using still image frames and text.

The recently viral product Humane AI Pin, a tiny wearable computer with a built-in AI assistant, camera, and projector, is another example of a product that has been criticized for overhyping its capabilities. A review by popular YouTuber and tech product reviewer MKBHD revealed the device's slow and often incorrect responses to commands, as well as its poor battery life.

MKBHD on his review of the Humane AI Pin

MKBHD's honest review of the Humane AI Pin has caused quite a stir, with some even accusing him of causing the collapse of Cognition. However, a quick Google search for "Humane AI Pin review" reveals that both traditional media, independent bloggers, and YouTubers are unanimous in their criticism of the device.

Going back to Devin, while the demo was indeed misleading and exaggerated, it is important to remember that LLMs and AI agents are still relatively new and rapidly evolving technologies. Even bigger companies have not yet released AI agents that are significantly faster, more efficient, and more accurate than human workers on platforms like Upwork. As Devansh mentioned in his blog post, we are essentially trying to build skyscrapers on shifting sands.

However, it is also important to note that AI is improving at a rapid pace, and we can expect future iterations to be better and more reliable.

Final Thoughts

This experience serves as a reminder to approach tech companies' advertisements about their products with a healthy dose of skepticism. Companies will often go to great lengths to generate hype and buzz around their products.

I want to emphasize the importance of honesty and transparency from companies developing AI tools and criticize the media and influencers for not verifying such claims before promoting them.

I urge users, developers, and content creators to maintain skepticism and adhere to truthfulness to prevent the spread of misinformation and its broader implications on society and technology.

Finally, if you are an investor, be aware. I see this as a future trend to get your money and potentially lead to a rug pull.

Stay ahead. Stay updated.