ALT and
VoiceXML are both markup languages for writing applications that use voice input and/or output. Both languages were developed by industry consortia (the SALT Forum and the VoiceXML Forum, respectively), and both were contributed to W3C as part of their ongoing work on speech standards.
So why two specifications? Mainly because they were designed to address different needs, and they were designed at different stages in the life cycle of the Web. VoiceXML arose out of a need to define a markup language for over-the-telephone dialogsInteractive Voice Response, or IVR, applicationsand at a time (1999) when many pieces of the Web infrastructure as we know it today had not matured. SALT arose out of the need to enable speech across a wider range of devices, from telephones to PDAs to desktop PCs, and to allow telephony (voice-only) and multimodal (combined voice and visual) dialogs. SALT was also designed at a time (2002) when many key Web technologies have become well-established (XML, DOM, XPath, etc.).
I will declare my interest here: I represent Microsoft in the SALT Forum's Technical Working Group. However, I have studied SALT and VoiceXML in depth, and will use this forum to take an objective look at the two specifications, and point out the main technical differences between them in an unbiased way. You can decide for yourself which specification is most suitable for your applications. (See Sidebar: Developer Communities)