One of the reasons that Android phones have such good voice recognition is because, under Peter Norvig's guidance, Google has acquired an immense corpora or database of what and how people speak. It is my contention that gestures and other non-verbal means of communication will eventually regain some of the primacy that they had before primate evolved verbal communication. If this is to happen to gesturing, then we need some fast, cool effective methods for recording the many gestures people make.
This is not a new thought. Personally and over the years, I have spent some fascinating moments exploring dance notation. And sign language is the codification of gesturing. But coming back to computers, we have all the methods used by computer games to record and replay the movements of game characters. Collada, FBX and the new glTF come to mind.
Here's the thing: gesturing can generate huge amounts of data per second. It's nearly as good ( or bad - depending on your outlook) as video - if nothing else because the data gathering usually is via video. Secondly, if data scientists are ever to be able to parse our gestures they will need the data in digital format. The concept represented by letters 'donut' is far smaller than the audio file of the sound bit let alone the object in question.
Because of my joy in exploring the Leap Motion device, I have spent the last month or so looking into ways of registering gestures.
One of my experiments is to record all the messages sent out by the Leap Motion device and save then in JSON format. The messages are used by software developers and for testing. In normal coding such messages are typically short and sweet (or not). But even a short gesture may generate a JSON file of over a megabyte. If you have a Leap device, you can have a look at the app here:
With source code and more details here:
Thus, as helpful as this app should be to developers and testers (especially as none of the example apps in the Leap Motion examples site can do this), this is not an app that should be used to recorded and replay a corpora of thousands or millions of gestures because the files sizes are too large.
In July I wrote a paper using Google Docs about gesture recording. You can have a look at the paper here:
Skeleton API Considerations for Leap Motion Devices R2
In this document I recommend looking at the BVH format. This is not my first encounter with BVH. Last I wrote a five post tutorial on getting animations into Three.js by importing BVH files into Blender. I have yet to hear or see anybody else that was able to follow - successfully - the tortuous path I proposed that you should dance down. And, in the mean time, there have been so changes, that half the stuff no longer works.
Anyway, because of the paper and because of the Leap device, I decided to write a BVH reader based on code I had found [only after many searches over a long period of time] including these two examples:
Even though I code a lot, I am not really a programmer and it soon all started to get a bit daunting. When that sort of thing happens I tend to go into denial and whatever. And I did a Google search on 'Three.js BVH reader' and up came this:
I nearly fell out of my chair. Here was everything I wanted: A simple Three.js app that reads BVH files. And more than that, the code itself is fascinating. The methods the author uses to do 'if/then' within a 'for' loop were totally new to me.
Saqoosha: you are amazing! And thank you for your kind permission to build upon your code. Here's Saqoosha's web site:
So within quick order I had several demos up and running - each accessing slightly different dialects of BVH. The links are at the end of this post. And now I have had several days, reading and thinking about BVH and comparing it with other methods.
And the TL;DR is that the BVH format is awesome. Accept no substitute.
You can read about BVH here and here and here.
Thing #1. The main thing is that the main data part of the format is about as sparse as you can get in uncompressed ASCII. It's just numbers and spaces. And, the most important, it's only the numbers you actually need.
Let me try and explain. To position something like a hand or foot in space you to specify it X, Y and as well as the pitch, roll and yaw angles. That's six numbers - the 'six degrees of freedom'. But the BVH files only records pitch, roll and yaw - three numbers. It assumes you can fill in the X, Y and Z yourself at runtime. How? Because the header tells you the offset distances for all the body bits. In essence, for the purpose of this app, the length of an arm or a leg is a constant not a variable, so you don't need repeat these values endlessly and the actual position is calculated in real-time frame by frame. Of course, all of this is recursive which short circuits my tiny brain.
Anyway, the main about BVH is that it is not possible to come up with a smaller method of recording motion than BVH. [I say this in the context of being a person often in the midst of people who understand mathematics - so wait and see awhile before accepting this assertion.]
Thing #2. Since the X, Y and Z information is all in the header. You can change this at any time. Even run time - and make the character morph as it's moving. Thus you can fairly easily adapt a BVH file to different character sizes.
Thing #3. All the movement data is in an array of strings which contain the relevant angles. At runtime you can easily splice, pull or shift the array and update the character to have a new series of motions. So you could have a character moving about for twenty minutes but be, say, just twenty seconds ahead in terms of data that needs to be loaded.
Thing #4. The BVH is supported by Blender, Daz, MakeHuman, Mixamo, FreeMocap and probably a number of other suppliers off 3D stuff. It's a fairly safe format. And the only commonly accepted format dedicated to motion.
Thing #5. The format is quite flexible. It can handle all the bones in the toes and fingers, or creatures with seven tentacles or just a robot arm with three moving parts. This does mean that there are a number of BVH 'dialects' out there, but my guess is that a good parser will eventually be able to identify the major types and adjust accordingly.
Thing #6. BVH data may be generated either via motion capture devices or by algorithm - and you can mix the two easily.
So is BVH perfect? Perhaps it is, but there is an issue. If BVH is the 'verb' - the thing that gets things moving, then what about the 'noun' the data that needs to be moved about? That is the subject of a whole story in itself and I will talk about this in an upcoming post.
In the mean, please enjoy the code that Saqoosha wrote to get your screen to dance:
Live demo: http://jaanga.github.io/cookbook/bvh-reader/r1/bvh-reader-saqoosha.html
Live demo: http://jaanga.github.io/cookbook/bvh-reader/r1/bvh-reader-saqoosha-cmu-daz.html
Live demo: http://jaanga.github.io/cookbook/bvh-reader/r1/bvh-reader-saqoosha-truebones.html
Details and source code here: