For this month's feature, we're honored to have Richard A. Bartle's thoughts on voice communication in multiplayer online games.
By Richard A. Bartle
When I first heard that the X-Box would support real-time voice communication between players, my heart sank. It didn't sink because the effect it would have on X-Box games; it sank because of the effect it would inevitably have on virtual worlds.
You can see how the logic goes. "Virtual worlds are multi-player computer games. The latest multi-player computer games feature real-time voice communication, and they're a blast! Players are coming to expect it, current virtual worlds don't have it. Hey, we should put voice in the new virtual world we're developing and blow away the competition!".
This is so depressing…
Newbies, I can forgive. Newbies have heard that virtual worlds are fun. They're looking to play one, but they have no easy way to determine which of those on offer is the best. Are they going to head for the one with all the latest bells and whistles, or the one that looks like it was created in 1999? Are they going to play the game written for broadband or the game written for 28.8K baud? Even if they've never experienced X-Box Live, they may have read reports extolling the amazingness of trash talk in Unreal Championship. Who wants to read, or use keyboards?
Designers are not newbies. Designers should know better. Maybe some of them do, and are right now locking horns with the marketing director and threatening to resign over the issue. Yeah, like that's going to happen…
The thing is, most designers of virtual worlds don't know enough about what they're designing. Design is about consequences. One of the consequences of adding real-time voice communication to virtual worlds is that it will attract newbies; this is why marketers want it. Another of the consequences is that when players cease to be newbies they won't stay for as long; this is why designers should be telling marketers they can't have it. Unfortunately, many of them don't give a moment's thought to the possibility that real-time voice communication might be A Bad Idea for virtual worlds. This is what's so depressing: it exposes just how little they grasp about their craft.
A know-nothing designer works on instincts acquired from being a player. They'll remember going on plane raids with their group, imagine how cool it would have been if they'd been able to talk to one another, and relish the thought of creating a virtual world where this would be a reality.
A more thoughtful designer might vaguely be aware of the concept of immersion, and have an inkling that real-time voice from players could present them with some difficulties in that area. They may also have some dim recollection of the importance of anonymity (or pseudonymity) in virtual worlds. However, these are bridges that can be crossed when they come to them, once the rest of the design has been fleshed out. No need to worry, la la la.
Designers who know their trade will realise that the introduction of voice - real-time or otherwise - will seriously influence the way their virtual world is played. They will also have absolute confidence in their ability to design round the problem, however. A little modulation here and there will give people voices that aren't their own. Sure, there may be some teething problems tackling the abuses that are certain to arise - it's an awful lot of data to log - but nothing intractable. Audible channels can be gagged as easily as textual ones. Things will work out.
Designers who simply understand will recoil in horror, despairing that anyone could even contemplate such an immersion-busting, reality-intrusive, anti role-playing debasement of what virtual worlds are. Don't these fools see what damage they're going to do?
Virtual worlds are just that, virtual. People play them to get away from reality; they play them to get away from themselves. In a virtual world, you can be someone else. By being someone else, you can become a better you. Why do people play the same game for hour after hour, night after night, for week after week, month after month? It's not because they like the game; it's because they like who they are.
Designers who don't understand that should go away and not come back until they do.
If you introduce reality into a virtual world, it's no longer a virtual world: it's just an adjunct to the real world. It ceases to be a place, and reverts to being a medium. Immersion is enhanced by closeness to reality, but thwarted by isomorphism with it: the act of will required to suspend disbelief is what sustains a player's drive to be, but it disappears when there is no disbelief required.
Adding reality to a virtual world robs it of what makes it compelling - it takes away that which is different between virtual worlds and the real world: the fact that they are not the real world.
Voice is reality.
"But it's not your voice". Well yes, gee, instead of sounding like I do in real life I can sound like someone in real life does after they've had their voice put through a processor. It fools no-one. Besides, even if the pitch changes were good enough to make men sound like women and vice versa (which they aren't), it wouldn't alter accents. "Hey, this elf babe is from England!". Hello reality.
This is what we're going to get. Virtual worlds will appear with voice. They'll attract newbies. They won't hold these players, but they'll condition them to expect voice in whatever virtual world they decamp to instead. To compete for newbies, new virtual worlds - and perhaps some well-financed older ones - will also add voice. Eventually, they'll all have it, their players will all be unsatisfied because of it, and everyone will wonder what the fuss with virtual worlds was all about. They're just like regular multi-player computer games except with more players.
I'm being pessimistic, I know, but still… Are virtual worlds as we know them doomed?
Fortunately, no, they're not. It's not that we shouldn't have voice in virtual worlds; it's that we shouldn't have it yet.
Voice isn't in itself any more disruptive of the virtual world experience than are photo-realistic graphics. It's fine to fool the senses, to make virtual worlds appear to be real, so long as that final step - their actually being real - is not taken.
Here's a look into the future…
Even if voice becomes the norm in virtual worlds, text as a means communication will still exist: not all players will be able to use voice. My wife can watch TV while I visit virtual worlds, but she wouldn't be able to if I were talking the whole time in the next room - it would be way too annoying. So I'd have to type; so would plenty of other people.
Having two distinct input channels - typing and speaking - is non-problematical, because no player experiences both simultaneously. Output, however, is problematical. If I'm talking with text and someone else is talking with voice, the person being talked to must read some conversations while listening to others. Ideally, they should either read them all or hear them all. As to which, well, you could make it a switch: those that prefer to read could see spoken words rendered into text; those that prefer to listen could hear written text rendered into speech.
OK, the technology to do this isn't quite there yet, but suppose it were. You'd have something that converts speech to text and something else that converts text to speech. So in theory, I could say something in my male, English voice, it could be converted into text, then replayed to listeners in a female, New English voice. It would be real-time voice communication, but no more "me" than my graphical avatar: just clothing for an alternative identity.
It works because it sounds real, but we know it isn't (hence we have disbelief to suspend). It works because it permits us to role-play (to become the someone that we want to become). It just works.
At the moment, though, it's mere whimsy. Current speech-from-text generation software isn't quite as bad as that used by Professor Stephen Hawking, but it still flows very awkwardly. Text-from-speech is pretty good once trained to your voice, but not if you start getting emotional (like when you're screaming for help because a dragon is eating you).
Give it a few years, though, and who knows? This could add a whole new dimension to virtual worlds! Not only do you look like a marsh troll, but you sound like one, too. How groovy is that?
Very groovy! Unfortunately, while we're waiting for it we may have to have to endure some otherwise excellent games ruined by the ill-conceived, premature use of an inappropriate form of the technology.
Real-time voice communication in virtual worlds does promise great things - just not yet.
The mustachioed man pictured to the left, Richard Bartle has an excellent thorough web site: www.mud.co.uk/richard. It explains his projects and insights better than this short biography can. In 1979, Bartle co-created the text-based MUD, the first system for players to share adventures online. He's continued development of games in many forms since that time and he works as an advisor and commentator in various capacities. His contribution of a simple taxonomy of MMOG players (Killers, Achievers, Explorers and Socializers) has been a valuable framework for discussing online player behavior.
Recently Bartle collected his expertise in book form, now available on Amazon: Designing Virtual Worlds.
Being an EQ bard, I would love to have voice chat. Then I wouldnt have to type in 3-second snippets hehe.
Also, I don't really get all the talk about immersion. I just want to play a game, I don't want to escape reality. If I really want to escape, I will grab a book off the shelf. When you play the game, you are interacting with real people who have real thoughts, feelings and opinions. Maybe it's just me, but I know that the Ogre with the Blade of Carnage isn't really an Ogre and that isn't really a sword. Its just a binary stream represented by pixels on my computer monitor. The point of a game is to have fun, and if voice chatting adds to the fun, what harm is there? In addition there is always the OFF button.
Posted by: Maclyn | 08/18/2003 at 10:30 AM
How is that one would want to escape from the world, when it's filled with tons of people who want to escape as well?
As for voice masking in general, couldn't you just fake an accent and had a voice mask on to that to throw off people? For the most part, I wouldn't care about the nitpicking of people, specially when it's my own character.
Posted by: Ian | 08/18/2003 at 03:45 PM
To add on to this, it's a bit disturbing how some of these posters remind me of Tsukasa from .Hack//SIGN series.
"I think I'll like it here... Yeah..."
It's just one more step closer to blurring the lines between reality and fantasy, only then would you have to be worried about crazy stalkers.
Posted by: Ian | 08/18/2003 at 03:54 PM
What I am really worried about is the fact that people who spam with text, MAY get the bright idea to spam their voice.
Ever play half-life and there is some jackass who only communicates with crackling midi music or profanities?
Imagine trying to role-play in your favorite online RPG and your speakers are filled with 1000 voices screaming at once.
That, my friends, is the power of X-box.
Posted by: VGR | 08/19/2003 at 07:23 AM
I'm a 3 year vet of EverQuest (now retired) and I played a good year or so with GameVoice available. We didn't use it all the time, and when we did it was only a few people. Namely the guild leadership. We didn't use it as an in game role playing opportunity. As a matter of fact our organizational needs were so great we turned to voice as a more efficient method of having logistical discussions with 3-8 people. Rather than detracting from game play it gave our tired hands a break, so that we could focus our typing efforts on role playing/in game activities and save the sensitive and guild organizational conversations for voice. That way GuildMember_01's indescretions with GuildMember_02 were dealt with in the "real world" as well as the delicate handling of loot distribution, complex intra-guild relations and some general "relax and chat like we're normal human beings" type chat.. so that we could unwind from the stress of making gameplay fun for our members (often at our own expense) and enjoy other members of the leadership as real friends and not just pixels.
To sum up, voice let us do more efficiently the things that were already outside of the scope of the fantasy world.
~foooo
Posted by: Foooo | 08/22/2003 at 04:04 PM
and to the guy who said
i didn t play the chick in pso to become a hot babe, and cerainly not to become a better person.
Posted by: olo | 08/23/2003 at 04:35 AM
sorry went wrong,
again, to the guy who said "it was obvious the article was about rpgs, not mario"
I still didn t plat the chick in pso to become a hot babe, and certainly not to become a better person.
sorry for the second post,blabla
Posted by: olo | 08/23/2003 at 04:42 AM
But since MMORPG'ers who aren't newbies will see the fallacy of adding voice chat wouldn't they be drawn to the games that do not include it? This could be a good thing, weeding the serious players from the newbies.
Posted by: hoju | 08/27/2003 at 07:29 AM
I agree that voice communication will enhance the immersion factor of any game instead of detracting from it. Saying that people will spam and abuse it is a moot point because people do that already with text and they will continue to do it until doomsday.
Besides there are ways you can counter such people: Muting is one obvious solution. Also if you have voice chat in a MMORPG it would be wise to have a radius based voice system, which means you can only hear the voices of characters immediately around you. To talk to someone else far away you'd have to use text, or (like previously stated) cast a spell to voice chat with him/her. Another interesting idea is some kind of mute spell you could cast on other players, preventing them from talking with text or voice for a short period of time. Thus if there's some jerkhead pestering you with annoying request for something or other you can either A: Kill Him, B: Go somewhere else or C: Cast a Mute spell or D: Mute him permanently by a buitin command.
Also stated already is that the human voice conveys far more emotion than text ever will, and is essential to any game, especially RPG's. Having to read text in a RPG dispels any suspension of disbelief the game may have and you are constantly reminded that you're playing a game-NOT in another reality. Indeed using your voice is a far more social activity than typing as well, when using text all you are doing is reading characters of a screen, but you are actually conversing with them using your voice it is a totally different thing-it is almost as social as talking to them face to face.
Posted by: Razumen | 08/27/2003 at 08:59 PM
Voice chat is ideal for team-based PvP games. Was involved with a PA in Shadowbane that used TeamSpeak and it was a great tool for coordinating large groups of people. However, players of team-based PvP games also play these games the way many people would go to their bowling or softball league... a team sport more than a romp through a Tolkein or Asimov novel. Richard is right on with regards to his predictions of the impact of voice chat in mmoRPg's.
Imagine going to see the LoTR and instead of the current cast they had put Chris Rock in the role of Gandalf, Brittney Spears in the role of Legolas, and Louis Anderson (may he RIP) in the role of Boromir. It would be mildly amusing for a while as a farce. However, the novelty of a ghetto version of Gandalf and whiney-voiced Boromir wears off pretty quickly and cognitive dissonance sets in. (it's worth looking up) This causes a decrease in immersion as what we expect an aged wizard or stout warrior to sound like is not what we're receiving.
The nice thing about MMO's is that everybody comes to the game minus the "real world" stereotypes and is judged based on their in-game activities. Thus, the 16 year-old living with their parents has the same opportunity to earn peer respect as the 27 y.o. playing after coming home from their job at IBM. The ruralite with a H.S. diploma living in the hills of West Virginia is perceived the same as the PhD living in the suburbs of Chicago. You bring voice into MMOGs and you bring all the baggage of real world stereotypes in with it.
Posted by: Jonzun | 08/28/2003 at 06:24 AM
The thing is we are all pioneering virtual world technology that may lead to who knows what one day, the hollowdeck?
Were going to try and evolve many things and just see where the chips fall. This just may be a natural evolution.
Posted by: Molokan | 08/28/2003 at 07:24 PM
Hard core RP'ers always annoyed me when i used to play EQ. They are always crying about how then want their own servers, a safe haven for fellow RP'ers to talk 'in character.' It is so silly to think that just because thou doth not speaketh like Monty Python's Holly Graile, that you are not roll playing. The bottom line is, in these worlds, you ARE annonymous. Everyone knows that they are annonymous and therefore, nobody acts as they would in real life.
Adding voice chat will be very similar to typed chat that is 'out of character.' Gamers will continue to play because their real lives will still be less satisfying than their virtual ones and the same whining RP'ers will continue whine.
Posted by: evolume | 08/29/2003 at 10:54 AM
I read the article and most of the comments. To be honest, this whole issue - broadly, the fact that there *is* no way to make people stay in character (most of the time) in an online fantasy world - is the precise reason why I don't play massively multiplayer online games. That's with typing. And yes, it would be ten times worse spoken.
I used to play pen and paper D&D in groups of real people speaking. That's different for two reasons: first, as the author said because the game world is imagined and not 'shown'; but second and perhaps more importantly, because the people you're playing with are guys you know and they will be making at least some vague effort to keep in character. That's something you just can't rely on or even realistically hope for in an MMORPG.
I now play D&D online using IRC, so we type. Same deal: we try to keep in character. (Sure, there is sometimes out-of-character backchat... and text is great for backchat, since it's easier to skip or compartmentalise.)
As for the speech conversion, I happen to have a master's degree in related topics. I also work on a voice conferencing system. So I know something about that kind of deal.
In all likelihood, the process of converting your voice to a different voice and accent would not be done via plain text.
Fundamentally, as noted you'd lose most personality and expression. Text to speech systems *are* good nowadays but they can't, and never will, work magic. It needs a good writer to write sentences that contain the personality or emotion you want to express in the first place. You can't do that in a hurry, and it won't happen from text that was obtained initially from your speech.
On a technical level, accurate voice recognition is a really, REALLY difficult problem if you are going to speak in different tones (e.g. screaming at the microphone), with background noise (e.g. game soundtrack, somebody else comes into the room and asks whether you want dinner), using different accents (e.g. you are French), with a huge vocabulary (e.g. a fantasy world with lots of strange placenames, and character names people are free to create as they see fit), and possibly with poor or abbreviated grammar (because you're in a hurry, or again you are not native English speaker). Even a human given this task will not achieve near 100% accuracy. Computer speech recognition is very far from that.
More likely would be an attempt at phoneme recognition. Phonemes are the basic speech sounds - for example the 'b-' sound at the start of 'book'. Recognising these comes with a high degree of error but, if you are only transmitting speech, it doesn't cause too much of a problem if the phoneme was slightly wrong, as the resulting output is likely still understandable. Transmitting phonemes, with timing and some other information, could provide sufficient data to reconstruct voice using different sounds; you could even have it transform vowels to shift your accent, or adjust timings if trolls speak more slowly. This is also a rather effective data compression scheme... I'm not sure how well it would really work or whether this is already used, but it's the kind of area to look at IMO.
I think changing your voice (female-male, human-troll, young-old, even changing some regional accents) is probably a more realistic task than speech-to-text in an MMORPG situation. Neither solution would solve the problem that many people aren't interested in roleplaying on those systems, which is the fundamental reason for difficulty in creating an immersive fantasy world.
--quen
Posted by: quen | 09/01/2003 at 07:27 AM
I do not believe in artificial restrictions on any sort of development, unless it's a matter of ethics. Sure, voice chat will change the design of virtual worlds dramatically, but designers will adapt, as they always have. It's ridiculous to outright dictate how a game should or should now be without even exploring the possibilities the option opens up.
There's no doubt that voice is a better form of communication than text. It communicates mood and expression; how many times have we tried to make sarcastic comments or jokes online, only to have it completely misunderstood?
For sure more of your off-line self will be communicated to others with voice chat. But wouldn't that add complexity and dimension to your otherwise bits-n-bytes character, especially with voice alteration in place to simulate your character's voice register?
Besides, better communication will mean more people will feel responsible for what they say online. This, I feel, is a good thing. Countless times I've seen racist and sexist comments being made while gaming online. Maybe there will be less of that if people actually had to say the words they type.
Posted by: SpunikSweetheart | 09/03/2003 at 04:32 PM
Well, let's put it like this. Let's say you have a voice-recognition program that takes what you say and interprets it as text. It's output as text to everyone else; what is the difference?
Or, let's take the opposite side and say the game takes everyone's text and outputs it in a pre-approved voice scheme for that archetype (say, model).
Combining these with the linguistic structures that have been discussed above and the *inevitability* that voice will be brought into MMOs, I believe an acceptable situation can be resolved.
Posted by: Arafelis | 09/14/2003 at 08:24 PM
Hmmm,personally can't see what the problem is. I think voice chat is ok but on a smaller scale. My guild in star wars galaxies(some of it ne ways) uses voice chat to communicate and i think its brilliant. When ur a crafter as well and ur sat there going through the tedious task of grinding it becomes very boring very quickly. But being able to talk to ur fellow guildies about quests in game etc made it more interesting. But everyone using voice would become a headache, the voice channel just wouldn't be able to handle all the traffic, i think things would become confusing especially with several conversations happening at once. But on a smaller scale with a guild its an interesting and powerful tool imagine the possibilities when pvping on a big scale and instead of having to waste valuable time typing 'heal me' you could just say it! Ive been playing mmogs for a few years now, so in the words of me mate valor:'Voice chat is the future, embrace it'
Posted by: Kalidor | 09/15/2003 at 02:46 AM
While I see your point about hearing the "elf babe's" english accent (or more likely her unmistakably masculine voice) ruining immersion, I will jump on my soapbox to again curse what to me kills an EQ conversation faster than anything: BAD MIDDLE ENGLISH.
Seriously, if you don't understand that thy is the possessive form of thou or that thou is the familiar form of the more formal you, then please stop role playing with an English accent. Right now.
Please.
Talk like a pirate, or like a hollywood arab or anything, please just stop saying "I willst go withe thou" to me.
Unfortunately, I've heard no plans for anyone to write realtime Middle English conjugation checking into their MOG. Believe me, it would be a better feature than voice-chat.
Posted by: illovich | 09/26/2003 at 12:53 PM
Theres no way on earth my wife is going to watch me talking to a monitor :) She'll get me sectioned :P
Loads of pros and cons to this argument, personal choice will win out. I can see the FPS games using this more than MMRPGS simply because of the fact that I wouldn't want a troll talking to me in a US accent. That would kill the whole experience for me.
Posted by: matt (uk) | 09/30/2003 at 02:13 AM
If you don't start something, you'll never be able to perfect it.
Posted by: Vertigo | 09/30/2003 at 09:44 PM
Piled up to the neck in ____.
Posted by: Bob | 10/06/2003 at 05:26 PM
It seems to me as though the argument is irrelevant. Companies are interested in voice chat are going to employ it. And the majority or gamers support it. Changing your voice to a new pattern that cannot be decoded is actually very simple, as I am an audio engineer I know of what I speak. As the virtual world marches forward towards replacing the mundane social interaction at a more personal level will be sought by the players. The next generation of gamers is for this and corporations see it. It may all be irrelevant in twenty years as the technology is increased to a level that is full immersion. A group of companies that I will leave unnamed at this time is working hard to create machines and software that will react to thought, ergo you think what you want to say to someone and the machine says it for you. It's not as far off as you would think. But I digress.
My position is that Im for it. In the VR world it is part of the next level of experience enhancement.
Posted by: Tim | 10/13/2003 at 10:42 AM
My freind and I had a rather intresting solution after reading this article and it goes as follows:
Start including voice recognition software in games with lots of chatting. The messages would still be sent and recieved as text, but the player could input their messages vocally if they chose. It would be much, much faster than typing, and because no one is actually hearing your voice, it doesn't break the immersion or character.
It's not perfect, as I understand it voice recognition technology still has a long way to go before it works really well, but people make many typos when they're trying to type at game speed anyways.
Then, in the future, when computer vocalized text doesn't suck so much, you could pick a voice for your character and have the message read aloud by the computer at the other end in that voice!
And it's also Lo-Bandwidth, because all you're actually sending is text.
Again, I'm sure there are a meriad of possible problems with the system, but I rather like the idea at first glance.
Posted by: Brad Hackinen | 10/23/2003 at 05:08 PM
It does depend on how fast you can type. I can type almost as fast as I can talk, so the only time voice chat matters to me is in games like Counter-Strike, where you can have real tactics, but things happen so fast that if I stop to say "He's hiding behind that crate!" then the man (or, excuse me, person) behind that crate will kill me.
I can certainly understand how this _could_ kill immersion. I think the best way to explain this is to refer to cosplay, as done at Otakon 2003.
Roughly half the population at any Otakon is dressed up as anime characters. Those that are stay in character roughly half the time. And yet, rather than destroy the metaphor, it enhances it. I may have been able to press X twice and get Squall to attack in Final Fantasy 8, but dressed as Squall, I can have an actual, honest-to-God swordfight.
It's true that this doesn't make it nearly as easy for me to, say, pretend I'm a girl, or a mountain troll, but I usually am myself anyway. If a 16-year-old slightly overweight white boy with an afro can slice you in half in two seconds flat in-game, it doesn't matter if everyone knows he's not a ninja with a lightsaber.
In fact, what I always go looking for in games is not a way to be someone that I'm not, but a way to be in another world, and do things I would never be able to do in reality. In reality, you don't simply respawn and go get your revenge every time you die.
It may be true that voice would destroy certain games. I'd be curious to see that. For the most part, though, I think that even with no modulation at all, people can still roleplay well enough.
And if you discover that this elf is from England, so what? It's an elf from England. Everything else about it is still the same -- it still has wicked weapons, fast reflexes, and pointy ears.
One more thing: I refuse to address technology issues. 56k still works for some games, but barely. At DSL speeds, voice chat can be very minimal compared to overal bandwidth usage.
Posted by: Jedi Ninja | 10/25/2003 at 09:46 AM
I think Mr Bartle has the perfect solution with the Speech -> Text then Text -> Speech conversion, though I have different reasons.
First, some person mentioned use of third party programs such as teamspeak. My Dark Age of Camelot guild uses teamspeak and it's an awesome tool. There are problems with it however.
1. Many people who live geographically close sound similar, it can be hard to tell who is speaking.
2. Bandwidth. Those on dialup are usually not able to use it.
Now, why do I differ on the reasons for Mr Bartle's proposed system? The hearing impaired. I havn't seen any mention of that aspect, yet there are many people playing these games who have partial or full hearing loss. If voice-only communications was introduced, all of these players would be excluded.
The speech->text then text->speech solves that problem as the text version can be displayed, or both the text and the sound can be played.
The other point I differ with Mr Bartle is that the technology is not there so we shouldn't try it. The technology is there, it's basic and doesn't work great, but it could be done.
When you were developing a MUD did you say "Well, in 10 or 15 years we'll be able to put some great graphics in, let's wait until then."? No. We can do speech->text now, and we can do text->speech now ... it won't sound realistic, but so what? Neither did graphics until recently and nobody cared about that at the time.
Some of this could even be done third party. For example in DAoC to chat I hit Enter and type. A third party program could be listening to my mic with a keyboard hook ... when I say something it converts it into text, hits enter and sends it.
Posted by: James | 10/30/2003 at 08:34 AM
Voice should not be in mmog, mmorpg, etc. There are a VERY few select exceptions to this, such as Planetside. It's difficult to recieve and give orders, while playing a complex military tactical 1st person shooter at the same time. Other games, like Diablo for instance, would just be bad for voice. Or everquest, another very famous name. Programs like TS2 make the voice/keyboard issue a problem, and in some cases such as Planetside, solve a problem. I still agree with you though, and this has been a useful article. :)
Posted by: tai | 11/03/2003 at 01:59 PM