AI Explain Why They're So Bad At Drawing Human Fingers

Image: Dall-E 2

No matter where you sit on the enormous and complex debate regarding AI models and their use of datasets to generate new images, artworks, writing and even movies, there’s one thing everyone can agree on: they’re damned creepy at drawing human hands. But why? Especially when AIs are so capable of recreating the seemingly far greater complexity of human faces. To find the answers, we decided to speak to a leading authority on this subject: an AI.

ChatGPT is by far the most well-known AI language model just now, causing huge ways across the world with its ability to hold natural conversations, answer complex questions, and generate extraordinary poetry, writing, and even the most complicated of human discourse: games journalism. However, as yet it’s incapable of rendering pictures, music, or video, so therefore a more impartial source on these areas.

ChatGPT on why AI struggles with fingers

I began by asking ChatGPT, “Why is AI so bad at rendering human fingers?”

“Rendering realistic human fingers is challenging for AI because they are highly articulated and have complex shapes and textures,” the AI explained, adding, “capturing the subtleties of how light interacts with skin, nails, and wrinkles requires advanced modeling and rendering techniques.”

While all of this is certainly the case, that does miss a couple of key factors. Firstly, AI is already surprisingly adept at rendering accurate human faces, which are surely a more complex structure than a hand, and—well—AI can’t even get the number of fingers correct.

“Some AI-powered tools and applications can generate realistic images of faces, including facial features like eyes, nose, and mouth,” ChatGPT said when I put this contradiction to it. “The technology behind these applications is based on machine learning algorithms that have been trained on large datasets of human faces, which allows them to learn the patterns and details necessary to create realistic-looking faces. However, even with advanced algorithms and large datasets, rendering fingers remains a challenging task for AI.”

G/O Media may get a commission

Image: Dall-E 2

But why? Humans are incredibly picky about faces, very fast to immediately notice anything that deviates from the symmetrical norm, but might take a long time to notice something as obvious as an entirely missing finger. It seems bizarre that the reverse should be true for AI.

“Faces are relatively simpler to model because they have fewer moving parts and a more uniform surface texture,” said ChatGPT. “The human brain is also highly attuned to facial expressions and features, making the task of generating a convincing face a high priority for AI researchers. As a result, AI has made significant progress in generating realistic faces, but rendering fingers remains a challenging task that requires further research and development.”

So while it could be as simple as a lack of priority from developers, it still strikes me as bizarre that it can’t even count to five. At this point, ChatGPT began to show some of that defensiveness that tends to creep in when any AI is challenged. “While AI has made significant progress in generating realistic images, it is true that some AI models have difficulty accurately generating the correct number of fingers in an image,” it began, again arguing that the datasets might not be large enough. But then it continued, “However, it is worth noting that not all AI models have difficulty with finger rendering, and some models have been able to produce very convincing images of hands and fingers. The development of advanced neural network architectures, as well as new techniques for training and data augmentation, may help to address some of the challenges in rendering realistic fingers in the future.”

So which ones are succeeding? “One example is the Deep Hand Model, which is a generative model that uses a 3D hand model to generate realistic hand poses and shapes. Another example is the Self-Attention Generative Adversarial Network (SAGAN), which has been used to generate realistic images of hands with a high level of detail and realism.”

Image: Chatsonic

ChatGPT on why fingers matter

But then ChatGPT moved the conversation on, itself, to a more interesting area. Because as much as we might question why AI creates hands like impossible tridents, it’s too easy to stop short of asking why it matters. Fortunately, ChatGPT was well ahead of me. “There are also AI models that specialize in generating specific types of hand images, such as hand gestures or sign language,” it stated. “For example, the Sign Language Transformer (SLT) is an AI model that has been trained to translate sign language videos into natural language text, and it includes a hand pose estimator that can accurately predict the position and orientation of fingers in the hand.”

According to the chat bot, the SLT achieves the latter via a “convolutional neural network,” then passing this information through “a series of transformer layers,” which “encode the spatiotemporal information of the sign language gesture.” Finally, a “decoder network” generates the corresponding natural language text.

GPT adds, “It has the potential to improve communication and accessibility for the deaf and hard-of-hearing community, as well as other applications that involve interpreting hand gestures.”

I asked if there were other practical uses for focusing on better understanding human fingers, and GPT offered me an entire list.

Gesture recognition: Improved finger rendering could enable AI to recognize and interpret human gestures more accurately, which has many potential applications, including virtual reality, gaming, and robotics.
Medical applications: Advanced finger modeling and rendering techniques could be used in medical simulations to train surgeons and medical students on complex procedures, such as microsurgery.
Human-computer interaction: By accurately modeling the position and movement of fingers, AI could be used to develop more natural and intuitive interfaces for computers and other devices.
Security: Fingerprint recognition is a widely used biometric identification technology, and better finger rendering could improve the accuracy and reliability of this technology.
Accessibility: Improved finger rendering could help to create more effective assistive technologies for people with disabilities, such as prosthetic hands or devices that allow for easier manipulation of objects.

Given so many practical applications, especially so many of them beneficial to those with disabilities, I wondered if this made the focus on human faces a little peculiar.

“It is not necessarily peculiar that programmers have focused heavily on faces in AI research,” GPT replied, “as facial recognition and generation are important areas of study for a wide range of applications, such as security, entertainment, and communication. The human face is also a complex and highly expressive part of the body, which makes it an interesting and challenging area of study for AI.”

“That being said,” it continued before I could ask anything else, “there is certainly value in exploring other areas of the body, such as the hands and fingers, and improving AI’s understanding and rendering of these body parts.”

Chatsonic on why humans are still best

Rival AI Chatsonic offers a far more broad suite of AI abilities—albeit far less likely to provide accurate or coherent responses (during one conversation, I found it espousing to me the mental health benefits of removing one’s own head)—I thought I’d let it respond to the topic. I began by asking it to create a few images of human hands, something ChatGPT cannot do (although of course OpenAI, the group behind ChatGPT, also owns DALL-E). They were as deeply creepy as we’re used to.

Image: Chatsonic

Following on, I asked it too why AIs are so bad at this. “AI algorithms are still developing, and they are currently unable to accurately render human hands in a realistic fashion,” Chatsonic said with much more frankness than ChatGPT usually offers. “AI algorithms are not designed to recognize the intricate details of human anatomy and movement, meaning they are unable to accurately reproduce a human hand in a realistic manner.” Although it added, with familiar confidence, “As AI algorithms continue to improve, it is likely that they will eventually reach a level of accuracy that is comparable to human hand-drawing.”

So I pressed again, why can’t it even get the number of fingers correct? “AI algorithms are not yet advanced enough to accurately recognize the details that differentiate human hands from one another,” it said, and then with modesty you don’t often see from these algorithm-based programs, “The nuances of the anatomy of the human hand require a level of precision and complexity that AI algorithms have yet to reach. This is why when creating realistic digital replicas of the human hand, humans are still the best choice for the job.”

I finished by asking both AIs if they thought that perhaps their current inability was a benefit, given it provides an easy way for a person to distinguish if they’re communicating with a person or an AI. The far more amenable Chatsonic was quick to agree. “Absolutely!” it replied. “The inability of AI to accurately render human hands provides an invaluable way to differentiate between interactions with AI and those with real people. This helps create a more secure online environment and allows users to be sure of who they are communicating with. Furthermore, it helps ensure that AI cannot take advantage of people by pretending to be a human.”

Meanwhile, the endlessly tedious ChatGPT and its paranoid requirement to endlessly inform you of its lack of opinions, alongside its bloody-minded position on “offensiveness,” prevaricated far more.

“As an AI language model, I do not have personal opinions or emotions. However, I can see why some people might view the difficulty AI has in accurately rendering human hands as a positive in that it provides a way to distinguish between human and AI responses,” it began, before wittering on for four paragraphs about details I’d not asked. Within it all, however, it suggested a far more negative tone. “Ultimately, the goal of AI research is to develop systems that can perform tasks as well as, or better than, humans. While the limitations of AI in certain tasks may provide a way to distinguish between human and AI responses today, it is likely that this distinction will become less clear in the future as AI technology continues to advance.”

Source link