GSoC 2017: Phase 2 Evaluation

By the end of Phase 2, I have completed following tasks:

  1. Create Micro window and transfer MediaElement to it.
  2. Implement video transition mechanism reacting on speaker change
  3. Develop basic toolbar buttons embedded in micro mode
  4. Modularize and refactor the micro mode code

https://github.com/jitsi/jitsi-meet-electron/pull/12/commits

 

There are two modules I created for this project: ‘p2pconnection’ and ‘micromode’.

‘p2pconnection’ module is responsible for transferring the Jitsi-meet MediaElement from one Electron BrowserWindow to another using WebRTC technology. More details can be found in my previous post.

‘micromode’ module is simply a complication of all the codes I have written onto the existing codebase to make the Micro mode work. Instead of writing everything on the main.js and render.js files, I have refactored them out as a module for better readability. Currently, Jitsi-meet-electron app’s code consists of three main parts: main.js, render.js and micro.js files. ‘main.js’ file is the Main Process part of the Electron framework while ‘render.js’ and ‘micro.js’ are the Renderer Process part of the Electron, each responsible for running a BrowserWindow.

In each process, respective part of ‘micromode’ module is imported, initialized and disposed once the application is closed.

Selection_062

As shown above, the main process simply has to require the ‘micromode’ module, and call inti, show, hide, dispose methods whenever they are necessary.

 

There are a few potential areas of improvement from the current version:

1. Add more features to Micro mode’s toolbar

Micro mode currently has audio mute, video mute and hangup features, but there are plenty of other functionalities in the Jitsi-meet application that Micro mode can also provide, such as chat and screen sharing. Some of the features are not suitable for Micro mode, such as live stream, shared document and shared YouTube video. Screen sharing feature especially goes well with the Micro mode because the user might want to show their desktop and be able to watch the remote video at the same time.

 

2. Optimize the video element in Micro mode.

Micro mode occasionally has a lagging issue, either the video is a fraction of a second slower than the original video, and the transition animation is sometimes clunky. This is probably because of the WebRTC video transmission overhead or too much resources taken by Micro window. One thing I can try at the moment is to lower the video resolution in the Micro mode.

 

3. Switch to more reliable WebRTC technologies.

Currently, Micro mode’s modules several WebRTC experimental functionalities which are quite unstable and some of which are deprecated. I had no choice because there is no practical alternatives. As WebRTC get more developed, there should be follow-up maintenance works to switch to newer and more stable technologies.

 

Conclusion

The more I work on this project, the more I realize the lack of IPC supports provided by Electron framework which causes spaghetti codes and runtime overhead, while giving me tremendous amount of headache during development. Nonetheless, the current version of Micro mode finally functions as its purpose, so the rest of the development would be mainly optimization works and adding more functionalities to it.

Game AI: Snake Game

snake.gif

https://github.com/leook0209/Genetic-Algorithm-Snake

Snake game is a simple game in which the player moves the head of the snake up, down, right or left to eat a randomly generated food. The snake grows its size by one every time it eats the food, and the snake dies once it hits any part of its body. This project is about training an utility-based snake game agent using a genetic algorithm with a number of heuristics.

 

Fitness

Goal(fitness) of each game is to have the snake’s length as long as possible, while taking as minimum as possible turns to finish the game. Fitness is calculated by following way:

fitness = Length of Snake- α * ( Number of Turns taken)

α: weight for Number of Turns Taken

The reason for minimizing the number of turns is that snake game has a very easy strategy to beat, which is to circle the snake around the edge and eat the food at the inner side of the field in a safe manner. Hence, α should be set with a reasonable number in order to prevent the agent to take an easy way out.

At each turn, the agent calculates a heuristic value for moving each direction: up, down, left and right. If a direction leads to the snake’s death, the heuristic value is NEGATIVE INFINITY. Even if a position next to the head contains the food, the snake might not decide to take it if it leads to a less desirable future state (e.g. creating a dead end). To prevent the Snake from taking the same motion over and over again, it is designed to be more attracted towards the food with time.

 

Heuristics

There are 6 heuristics the agent uses to calculate the fitness:

  1. Manhattan distance between the Snake’s head and Food
  2. The position of the Snake’s head from the center of the field
  3. Squareness of the Snake
  4. Compactness of the Snake
  5. Connectivity of the field
  6. Dead End Indicator

First two heuristics are quite intuitive to understand while next four heuristics are not. Those are heuristics concepts I created for this agent.

Squareness

Squareness is an indicator of how the Snake’s body is orientated in a square/rectangular manager.

O – Empty Space,  S – Snake,  X – Blank Space,  H – Snake’s Head

O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S S S H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

The above example’s snake is oriented in a perfect rectangular manner. In this case, the squareness value is 0.

O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S X X X O O O O O O O O O
O O O O O O O S X X X O O O O O O O O O
O O O O O O O S X X H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

The squareness value is the number of blank spaces that is within the Snake’s square boundaries but not filled by the snake. The square boundaries refers to the rectangular space taken up by the leftest, rightest, upper most, and lower most part of the snake. For the above case, the squareness value is the number of Xs, which is 8.

Compactness

Compactness is an indicator of how compactly the Snake’s body is oriented. It is the number of cases where one body part of the Snake is placed next to another body part of the Snake, without double counting.

O – Empty Space,  S – Snake,  H – Snake’s Head

O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S S S H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

For the above case, the compactness of the Snake is 10.

Connectivity

Connectivity is an indicator of how connected each part of the field is, and whether the Snake is separating one part of the field from another.

O – Empty Space,  S – Snake,  H – Snake’s Head,  X – Space Chosen by Agent

O O O O O O O O O S S H O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O X O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O S S O O O O O O O O O O

At each turn, the agent pick a random empty space in the field, and count how many spaces are disconnected from that space as they are blocked by the Snake’s body. For above case, the connectivity is 148.

Dead End Indicator

Dead End Indicator represents how many spaces are unreachable by the snake based on the current orientation.

O – Empty Space,  S – Snake,  H – Snake’s Head

O O O O O O O O O S S H O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O S S O O O O O O O O O O

Dead End indicator is calculated in a similar manner to the Connectivity, except that instead of choosing a random empty space, it checks connectivity from the Snake’s head. For above case, the left side of the field is unreachable from the Snake’s head, hence the Dead End Indicator value is 134.

 

After a few rounds of training, the genetic algorithm shows a trend to maximize the Compactness and minimize the Distance from Food, Squareness, Connectivity and Dead End, while it does not really care about the Distance from the center of the field.

 

Genetic Algorithm

The genetic algorithm for the Snake Game agent has a population size of 500, mutation rate of 0.05, survival rate of 0.5. For each weight sets, the game is played for 3 times and taken the arithmetic average, in order to minimize the effect of randomly generated food positions.

Crossover

At each generation, the population is sorted based on their fitness value, and two parents are chosen from the 50% of the surviving population, while a Snake with higher fitness having a proportionally higher chance of being chosen. A child is created by taking the weighted average of each heuristic weight of the parents’.

Mutation

Each heuristic weight of the Snake in the population has 5% chance of being mutated by ±0.2.

 

The biggest learning point was that it is a bad idea to create a computation intensive program with Web (JavaScript). Due to the performance limitation of the Chrome browser, it took so much time to train the agent with the Genetic Algorithm of population 500. Even after the training, the elite Snake still could not finish the game. The most difficult part is that the food can be generated at an unreachable position. Hence, it is wise to minimize the number of holes generated by the Snake’s body (Connectivity) but then it takes too many steps to clear the game.

GSoC 2017: Phase 1 Evaluation

In the first month of the project, I have attempted following tasks:

  1. Capture the large video embedded inside the Jitsi-meet’s iframe in the main renderer BrowserWindow
  2. Transmit to the micro mode’s BrowserWindow
  3. Display on a HTML video element

 

 

1. Capture the HTML Video Element inside iframe

source

The Jitsi-meet’s largeVideo can be extracted directly from its iframe, using

iframe.contentWindow.document.getElementById('largeVideo');

I can subsequently retrieve the source MediaStream from the video’s srcObject attribute. However, if I simply display that MediaStream on the micro mode’s window, it does not react when the dominant speaker changes in the Jitsi-meet application, because the original HTML video’s srcObject attributes switches to another MediaStream.

There are two possible options to implement video transition in Micro Mode:

  1. A ‘hacky’ way. Import Jitsi-meet’s APP object and listen to the dominantSpeakerChanged event. Once the speaker changes, re-extract the largeVideo from the iframe.
  2. Capture the largeVideo displayed on the main BrowserWindow, convert it to MediaStream and send to the Micro Mode’s window. When the speaker changes, it automatically captures the video transition.

So far, I have attempted the second approach, but it has several problems. The existing version of HTMLMediaElement.captureStream() method does not work because the largeVideo extracted from the iframe lacks ‘currentSrc’ attribute, hence keep throwing “The media element must have a source.” error. I need to find an alternative of HTMLMediaElement.captureStream() that captures a HTML video element real time, and produces a MediaStream object.

One approach I tried was using the HTML canvas to take a snapshot of each frame of the largeVideo and render it like a video.

 

 

2. Transmit the Video to Micro Window using WebRTC

The inherited difficulty of this task is that each Electron BrowserWindow is an independent Chromium page. There is virtually no direct way to transfer media data from one window to another. After a long research, it was concluded that using webkitRTCPeerConnection to set up a MediaStream peer connection between the main window and the micro window is the most feasible approach.

Reference: https://www.tutorialspoint.com/webrtc/webrtc_video_demo.htm

The details of how peer connection is implemented are shown in the GSoC 2017: log #1.

Since the main audio is played in background after the Jitsi-meet window is minimized, there is no need to transmit the audio to the micro window.

One major concern is the performance issue. Running a background peer connection between the main window and the micro window might cause a performance drop of the Jitsi-meet conference.

 

 

3. Retrieve the Video and Display on Micro Window

After the MediaStream of the largeVideo is received by the micro window’s side, it can be simply displayed by setting the srcObject attribute of the HTML video. Then, the micro mode’s window is positioned on the top right corner of the screen, with the frameless option and always-on-top options enabled. The end product looks like this…

evaluation.png

However, several problems emerged after testing minimization of the main window.

Using a HTML canvas to capture the large video works only when both windows(main and micro) are active and visible. When I tried minimizing the main window, the video in micro window just freezes. I am guessing that HTML canvas’s snapshot method works only when the target video is active(not minimized). I need to search if it is possible to play the target video in background even after the window is minimized.

Furthermore, when I tested sending a local video(mp4) from the main window to the micro window, videos in both windows ran smoothly. However, once I minimize the main window, frame rate of the micro window’s video immediately starts dropping. It is still playing, but the frame rate drops quite seriously that it does not look like a video anymore. I am going to research how Chromium browser handles a background Media, and if there is a way to activate a target MediaElement in background.

 

Conclusion

Although the video transmission appears to be working, there are numerous internal problems I have to resolve before I move on to the next step. A lot of time is consumed for researching possible methods to implement the features, and only a fraction of time is used for the actual code writing. I used to only look up StackOverFlow for my programming problems, but for this project I had to read on many official API documentations, issue & bug trackers, and discussion threads. This is because the problems I used to solve were a kind that had one simple solution which worked cleanly. But for the Jitsi-meet-electron project, there are many different approaches I can take to solve the same problem, and in worst case scenario, the problem is actually unsolvable / not supported.