Scott Hardie | May 2, 2023
This video might not look like much on the surface, but it demonstrates the future of video games and it's very exciting:

That's a preview of a user mod that's being developed for Skyrim VR, which itself is already an immersive VR game: Put on the headphones and visor, and you're experiencing the fantasy world from a first-person perspective; NPCs (non-player characters) will look you in the eye when they talk and so on. This user mod (additional code created and shared by users with the blessing of the game company) will add several more features:

First, there's speech-to-text through the microphone, so that what you say (such as "were you ever told campfire stories?" at the beginning) can be interpreted by the game. Modern game consoles have microphones in their controllers for online play with other humans, but solo games like Skyrim rarely use them.

Then there's text-to-speech, so that text generated in-game can be read aloud by the characters in the world. I've played Skyrim and I recognize the robotic speech in that YouTube video as the distinctive voices of characters like Lydia and Farkas, who normally speak scripted lines recorded by voice actors. I bet that all of their recorded dialogue from the real game has been fed into AI that generates new speech using the sound of their voices. And the video title indicates that the team is still working out details like lip syncing.

Then there's a chatbot to generate dialogue to be spoken by the NPCs. That's a great idea, allowing for spontaneous and plausible dialogue based on spoken prompts from the player. And much like how the speech software can produce audio that sounds like the character's voice, the chat AI could be fed all of the scripted dialogue for a character from the real game in order to produce new dialogue that mimics their speaking style. In that video, some of the characters sound more like themselves than others; to me, Danica (the woman wearing the golden shawl) sounds the closest to the real thing.

And the most exciting part to me, the truly next-level aspect of this, is the in-game awareness. Without that, you're limited to abstract conversations, the same as with any plain text chatbot. But the ability to hand a sword to an NPC and ask it "what do you think of this?" and have it voice an opinion, or the ability to ask a shopkeeper for information about their hours (which is a real thing in-game, since the shopkeeper walks home at night and returns in the morning) and to get a mispronounced but still accurate answer, that's exciting. Right now, the limitations of the game Skyrim (which came out in 2011 long before any of this was foreseeable) restrict these conversations to being a "spell" that the player must initiate in-game with a "magic sword" if I'm understanding the video correctly, but you could see how a future game could integrate the technology to happen spontaneously; perhaps you knock over a flagon in the tavern and someone says "watch it!" or (smarter) "that's the third time tonight! it's time for you to go home" or (even smarter) "how could you be this clumsy today after you slayed that dragon on that nearby mountain last week?"

It's especially interesting to me that this project is being done in Skyrim from the Elder Scrolls series. Its 1990s predecessors in that series, Arena and Daggerfall, had absolutely gargantuan worlds to explore that were mostly procedurally-generated and thus not terribly interesting, since each location felt like the same elements over and over again in a new random arrangement. In 2006, Oblivion introduced NPCs that had free will thanks to AI; each character had some personality definitions like how compassionate they were and how responsible they were and so on, and then they'd behave in the world according to their needs, such as spontaneously stealing to feed themselves or joining the game's hero in a fight against a dangerous animal. Today that's not very impressive, but at the time it was revolutionary. What undermined it a little were the technological limitations, such as needing actors to pre-record dialogue for the characters to say, leading to robotic conversations that have been mocked in memes ever since. Skyrim had NPCs react to you, and any of my examples above with the knocked-over flagon could have appeared in that game already -- but only if the company anticipated it in advance and recorded dialogue for it and coded for it in the game, which means that only maybe 1% of the things that you did could be acknowledged by NPCs. Having AI generate potentially any speech about literally anything that you do in the game, in the style and voice of a defined character, especially if it remembers your previous actions and knows of famous things you've done in the game world, that makes the game world feel alive in an exhilarating new way. And ideally it would go a step further by tying these advancements into NPCs' actions beyond just conversation, such as you telling someone "hand me that coin on the floor" and they might obey depending on their free will, or telling them "I think I see a rat in the corner" and they might scream or flee or try to kill the rat or thank you for warning them or curse you out for pranking them. The more you get to know a character, the deeper and more personal your relationship and thus your interactions could become. Imagine the infinite possibilities.

The video above is obviously still very rough. The generated speech sounds mechanical, the wait times for a response ("let me think" indeed) are way too long, and the NPCs are still limited to mere conversation. But you can see how a few years of refinement and further advances will smooth out these issues. Watching it feels to me like one of those mind-expanding, once-in-a-generation moments I've had in games, such as the first time I played in a true 3-D world or the first time I saw real film footage in a game. What do you think of this?

And just to be clear: I'm sure that this kind of project is being developed all over the video game industry, by companies and by users alike, not just this one project within the twelve-year-old Skyrim. This video just happens to be the example that I came across yesterday.

