A Multi-touch Application Development Framework

Friday, February 15, 2008

The Detector

As Fouad was done with the input, the next step was to detect the fingers' positions (Blobs) in the frames of the video to process them later on.. That's where the detector comes..

Basically, we wanted a fast algorithm that can run per each frame with the least cost, so our choice was the Component-Labeling Algorithm Using Contour Tracing Technique. That algorithm runs in linear time & was reported faster than other ones (see the table bellow)

#CC : Number of connected components

How the Algorithm works?

As a polygon can be determined by set of vertices, a component can be determined with the set of points lying on its contour.. Mainly, the Algorithm scans a binary image & the process is divided into four steps (as illustrated in the figure):

1- when an external contour point, say A, is encountered the first time, we make a complete trace of the contour until we return to A. We assign a label to A and to all points of that contour.

2- when a labeled external contour point A' is encountered, we follow the scan line to find all subsequent black pixels & assign them the same label as A'.

3- when an internal counter point, say B, is encountered the first time, we assign B the same label as the external contour of the same component. We then trace the internal contour containing B & also assign to all contour points the same label as B.

4- when a labeled internal contour point, say B’, is encountered, we follow the scan line to find all subsequent black pixels & assign them the same label as B’.

When implementing the above procedures first we add a dummy row of black pixels at the top & associate the image with an array carrying the labels per each pixel (initially all pixels are unlabeled).
We start initially with label C (where C = 1). Following the 4 steps described above, we will be able to assign a label for each pixel, The value of C is incremented when a new unlabeled external contour is encountered.
Each time the value of C is incremented, this indicates that a new blob is found, a new blob is then created, given a new id & added to the list of blobs (which is later on handled by the tracker)

Regarding the contour tracing Algorithm, its a flood fill algorithm that works by checking the 8 neighbors of a pixel searching for white pixels, The searching procedure is also done in an optimized way that makes searching faster and also guarantee that each pixel is checked only once during the whole procedure.

Now, The Algorithm is implemented & tested successfully with other modules of the project.


Updates:

For further processing requirements, some additions were added to the algorithm including: calculating the center of mass of each blob by calculating the total mass. In addition to that, Blobs are inserted in a sorted order. (refer to the Tracking post)

Further work:

Investigation about making the thresholding value that determines the blobs a more dynamic one depending on the illumination of the frames received.

References

Component-Labeling Algorithm Using Contour Tracing Technique Paper

Tracking Demo

Well , i'll be discussing with you the TrackingDemo application .

This demo aims to test the Tracking module "twTracker" , the TrackingDemo application is a C# application , it recieves each frame a list of updated blobs from the Tracker and draws this list using GDI+ with a different color for each blob.

TrackingDemo contains one main Class "TrackingSimulator" , it contains a HashTable of the current blobs , the blob ID is the key of the HashTable , each frame a sequence of messages is sent from Touchwork application to the TrackingDemo application :
- A Start message of ID "5002" , contains the number of blobs in this frame used in initialization.
- N number of messages each of ID "5000" contains: blobID , blobPosition ( x , y ) and blobPressure ( number of pixels in the blob ) ,,, where N is the number of blobs in this frame.
-An end message of ID "5001" , signifies the end of frame so as to start processing the input and draw the blobs.

Now i have a list blobs from the TrackingModule , and i have the HashTable of my current blobs , so i compare using ID to check the state of each blob , i have 3 cases :
1)- New Blob : The ID is not found in the Hashtable , so this is a new blob , i generate a random Color for it.
2)- Updated Blob : The ID exists in the Hashable , so i update the position of the blob , and it mentains its color.
3)- Removed Blob : The remaining blobs in the hashTable which don't match any blob in the list of new blobs are removed from the hashTable.

Now that i updated my list of Blobs , i draw each of them in their position with their given color , and the blobPressure is used to determine the raduis of the drawn circle.

Thats it ,, the TrackingDemo is finished and worked well , also it proved that the Tracking module is working good E7L .

Wednesday, February 13, 2008

Too much to do ..

Well, this is too much for a day ..

Fouad and I went today to college to try fix the hardware. Fouad went first, I followed. We were finally able to fix it temporarily by trying the new circuit boards and connect the IR-LEDs in parallel. It worked, but the light gets weaker the farther you move away from the power source: We need an electrician!
We also found a small trouble with the light and the threshold for detecting the blobs. We'll postpone fixing it till we fix the hardware.
Conclusion: We had to fix the hardware to make a live demo. The live demo was to test twInput, the input module, if it now captures the frame from a live camera feed and the rest of the modules could go on with their work. Thanks to Fouad, it's working great now. But the live demo itself? Not that sweet!

Summary of today's decisions:

  • Roles have been distributed among us:
    • Developers: All four.
    • Quality Engineers (QE): All four - We'll be testing other team members' modules.
    • Engagement Manager (EM): Mahmoud - Responsible for interacting with clients (professors!!)
    • Minute Keeper: Roaa - Times, schedules, notes, meeting minutes, etc. (the yellow notebook!!)
    • Configuration Manager & Information Systems: Fouad
    • Project Leader (PL): Alaa
  • This blog will be public from now on.
  • Roaa will fix a way to send us updates via emails rather than RSS feeds.
  • Tasks will be kept in an online task list, containing the WBS (Work Breakdown Structure) along with the Issues List, containing the issues and bugs found and their status.
  • Meeting with ITWorx supervisors will be bi-monthly or whenever needed. Their time changes to Wednesdays from 4.30pm till 5.30pm.
What's next?!

  1. Fix the hardware - Waleed Abdel Wahab from CriticalSites (a former FCIS graduate) has contacted a girl whom I'll be meeting tomorrow isA to help us with the hardware ..
  2. Another Live Demo - After fixing the threshold bugs and the hardware!
  3. twAgent - Fouad and Mahmoud
  4. twGesture - Alaa and Roaa
  5. Finish Documents.
  6. Seminars:
    • Saturday: Our third seminar at FCIS - Scope: Input and Detection modules.
    • Tuesday: Seminar before the QAAP Committee.

Saturday, February 2, 2008

Track Me ..

OK, I'll discuss the tracking module: twTracker ..

twTracker has mainly one class, BlobTracker, responsible for tracking blobs (obviously!).
It works using a simple match-and-mark algorithm, nothing sophisticated so far .. I had to ask Roaa to make some code changes in her detection module, so that blobs are stored in a sorted list as I'll be describing shortly. I also had to make some further modifications to the Blob class ..

Now each blob mainly has the following attributes:
  • int m_id: Carries the blob ID.
  • Point m_center: Holds the x- and y-coordinates of the center of mass of the blob.
  • float m_dist: Represents the actual distance between the origin (0,0) and m_center, the center of the blob (x,y).
  • bool m_checked: Is a flag used in the tracking algorithm.






The tracker is based on comparing two lists of blobs each frame to find the closest two blobs within a constant distance range - a threshold - and update a static list of "current" blobs. To ease the comparing, or the "searching" process, I suggested keeping the blobs in a sorted list, where the sorting would be based on the value of m_dist, the absolute distance between the origin and the center of the blob. It's somehow like putting all blobs on one straight line, where the first is the closest to the origin, and the farthest is the farthest from it (the opposite corner).

This is shown in the following diagram. It shows four sample blobs: A, B, C and D. Each blob is at a different distance from the origin as shown.


In the diagram, you can see that blob C is wrapped in a thick belt. This belt represents the threshold along that imaginary straight line (simply by adding and subtracting the threshold value from the m_dist variable.) Any blob that is outside or inside that belt (having m_dist value greater than that of blob C ± the threshold) will be excluded from the search process; increasing performance and reducing extra checking and comparisons. Notice also how blobs A and B could share very near values of m_dist as shown; which causes a problem that if blob A moved, both new A and B will match with the old A.

Now I keep a static list of blobs, currentBlobs, where I keep the updated positions of the available blobs only. Each frame, I receive a list of blobs present in the frame. I compare them to the current list and try finding the closest one within the previously mentioned threshold based on the m_dist variable. If two new blobs match one old one, I pick the closest of them to the older one. Matched blobs are marked using the flag m_checked. We have three cases that could happen:
  1. Appearance of a new blob: It will be added to the currentBlobs list.
  2. Change in an old blob position: It will be updated and marked in currentBlobs list.
  3. Disappearance of a old blob: It will be removed from the currentBlobs list at the end as it won't be marked.
I also keep the pressure (the number of pixels in each blob) so I can simulate the size of the visual circle representing the blob - a demo will be uploaded soon isA.

More Input

DirectShow, that component that your Media player is built upon, consists of Filters, which you may have noticed while configuring that media player.

Each filter has one or more Input Pins and one or more Output Pins

A compressed image enters at the input pin of a "decoder filter" comes out uncompressed from the out pin to the next filter in the chain, which is called "Video Renderer Filter" which displays it, but since it has no output pin Sink it drops the image, and sends a ready message upstream to the "decoder Filter" the intermediate Transition filter, telling it to send the next image, which in turn tells the "File Filter" the Source to push another image downstream. that is called a DirectShow Graph.

OK, about twInput
we have 3 main filters in the twGraph :)
Source...
1- a Capture Filter "the camera driver" or a Video File Reader
2- any neccesary decoding/Resizing filters to make the images "Media Samples" sent understandable to the next filters
Transition..
3- a Sample Grabber , it reads the samples (images), and sends them to a callback where we can process them, you cna either have a copy of the data, or the original data that will be sent again down stream.
Sink...
4- again some decoders so the next uderstands the samples
5- a VideoRenderer "now you have your media player" or a Null Filter , it displays the images recived then Drops the allocated space for the images.
...

Each filter can have prefered media types, so when DS tries to connect an output pin with an input pin, it finds the most favourable media type, if pins dont prefer the same type, they can nevver connect, DS resolves this by inserting some filters in between "decoders/converters" in an action called "Intelligent connect"
...
All this ado is wrapped in a simple interface "start/capture/stop/readFromFile" so whenever a decision is made to support other platforms/devices it should be somehow easy "dreaming".
...
Note: we did not use ATL in order to convince DS to get programmed, maybe because even if we used it it wont look like C# com interface, "did they call that Interop ?"
or maybe because ATL is not a walk in the park, "i think"
or maybe because we didn't need to use any Events or Sinks or Monks or any of the other crazy COM stuff of which i am not aware and the ATL was built for.
just wanted to remind my self of that.

Touchwork Input

In this post i will discuss the twInput component and try to elaborate some decisions made during its development.

Facts:

1-twInput is developed from scratch over DirectShow
so why not use a library:
  • Input is the main dependence of the framework, so if it is not flexible enough it will force the framework to take a certain shape, i.e. we will think "how to write TW and still fit with Input library" instead of "how to write TW", and still even when it is done we can change it any time to adapt with the changes in the main framework.
  • We needed the input to come form a camera or a video file, so we can easily test the code even if the hardware is not available, and lacked in most of the libraries.
  • Imagine spending a day learning a library interface, then after a month you have to edit in the library itself, so you move to learn the library interface that the library depends on , and then understand the code of the first top library dealing with the lower level library
    just too much ..
2-There are 2 main lowlevel alternatives for DirectShow
why not use them ?
  • ActiveMovie, outdated Now is called DirectShow :D
  • Video For Windows (VFW): much easier than DS , but outdated, mostly made for VxD devices which stopped production since 2000, now its WDM .. more details in the proposal
now i will move to the Design of twInput in another post

Friday, February 1, 2008

My World of Silicon ..

In a devastating search for a good compliant surface, I had to start experimenting at home. Most people abroad used Rosco sheets, Silicon sheets, milk paper, caulk paper, and even silk. Silicon sheets gave best results so far; unfortunately we don't have that here in Egypt ..

Just a reminder: Why do we need a compliant surface?
  1. So "moving" your fingers over the surface becomes easier, and will provide better tracking. The motion is not usually "smooth" enough over bare acrylic. It also holds less marks or stains comparable with the acrylic.
  2. Works as a projection surface. Projecting light on a transparent medium will show barely nothing.
  3. Dims the back-light on the other side of the acrylic sheet; it's really great working in a well-lit room.
So, I decided to try making my own silicon sheet at home ..
I bought a silicon tube, emptied it in a tray, n left it to dry till the next day (I'll add photos ASAP).

Results:
  • Silicon is so thick; spreading it evenly on a surface is nearly impossible. A solvent is needed to make it less thick, it could even be sprayed then.
  • Also the thickness creates millions of air bubbles, which will cause noise later in the detection process.
  • With a live test over the hardware, it's PERFECT!!! :D
I contacted two companies to help us out; one hung-up, and the other knows nothing about silicone. Therefore, the next step will be finding a solvent that will help me reduce the thickness of the silicon so I can spread it evenly across the sheet.