I would definitely say Lua is probably the best way to go if you are wanting to dynamically change the footstep type & volume on a per scene basis as you will need Lua tables to organize the sounds/point of origin for each scene.
As trepn says, you can play sounds with the openAL Lua function. Lebostein's tutorial that he linked would likely be a good place to start, though that's an action part solution only. If you want to dynamically control the balance & volume levels then Lua script is needed.
-- example of a Lua table containing some footstep sounds
sndz = {}
sndz["wood"] =
{
"vispath:sounds/footsteps/wood_1.ogg",
"vispath:sounds/footsteps/wood_2.ogg",
"vispath:sounds/footsteps/wood_3.ogg"
}
sndz["metal"] =
{
"vispath:sounds/footsteps/metal_1.ogg",
"vispath:sounds/footsteps/metal_2.ogg",
"vispath:sounds/footsteps/metal_3.ogg"
}
-- example of playing a footstep sound with a dynamic volume value
startSound(sndz[Values["ground_type"].String][math.random(3)], {flags=1, volume = math.random(75, 100)})
In the example above I've assumed that a value inside of the editor has had a string value written to it which will be used to determine the ground type - metal, wood, concrete, debris, stones, rock, fauna, etc. & from that we call a random number (technically we could dynamically get the total table index count for the table with Lua rather than manually declaring the number) & then we dynamically set the volume between 75 & 100 so that each time a sound is played, even if it's the same sound it will sound different because the volume level will be different.
Balance is a lot more complicated to give an example of because it's highly dependent on what sort of perspective view point you use for your game. Traditionally 2D games tend to use the same perspective style for most scenes so they don't need to create tons of character animations from different perspective points. 3D is a lot different because the camera can change position. Anyway, the next thing is figuring out if the balance should be based on the characters/cameras position from a specific source point or whether the balance should be based on the characters position which means footsteps will always be centered, but all other sounds will be affected based where the character is from them.