Archive for the ‘Development for translate-chat system for 6.7 billion’ Category

I began writing this blog on 18 January 2011 with some basic ideas on how a tractable network application capable of simultaneous one-to-one translate-chat could be possible using quite elementary ideas.  Since then I have made some progress developing the ideas.  The basic idea is that if the servers in the network knew at a fixed time who is chatting with whom, then the problem of  overwhelming traffic can be avoided by servers blocking messages for connections not on the list.  This fixes the message traffic for a fixed “state” of the network, and pushes down the traffic problem to the maintenance of a synchronized, or semi-synchronized, list of who is talking to whom.

We model the table of who is talking to whom using a B+ tree which has some nice balance of access and insertion speeds and maintenance of this table can be handled by the ability of servers to pass B+ tree data to each other.  So long as this operation can be made efficient, the application can be made viable.  I completed coding such an example using a modification of the B+ tree implementation from Tree::BPTree (to add functions for string serialization and string reading) and then a TCP server that maintains such a B+ tree, obtains in string form data representing another B+ tree to merge that it obtains from a TCP client.  Both the TCP server and client can be quickly written using POE::Component::Server::TCP and POE::Component::Client::TCP.  A red-black tree also has excellent qualities and the CPAN implementation is more efficient than the B+ tree implementation, so I subclassed Tree::RedBlack to add methods to_str, from_str, and merge in class RBTree and used it for a test.   In the test, there is a TCP server and a client.  This is the server code:


use warnings;
#use strict;
use RBTree;

use POE qw(Component::Server::TCP);

my $tree = RBTree->new;

for (my $i = 0; $i <= 10000; $i++) {

Port => 12345,
ClientConnected => sub {
print “got a connection from $_[HEAP]{remote_ip}\n”;
$_[HEAP]{client}->put(“Smile from the server!”);
$_[HEAP]{client}->put( $tree->to_str );
ClientInput => sub {
my $client_input = $_[ARG0];
my $ntree = RBTree->new;
$ntree->from_str( $client_input );
#$_[HEAP]{client}->put( $client_input );

This is the client code:


use warnings;
#use strict;
use RBTree;

use POE qw(Component::Client::TCP);

my $tree = RBTree->new;

for (my $i = 100; $i < 200; $i++) {

RemoteAddress => “localhost”,
RemotePort => 12345,
Connected => sub {
$_[HEAP]{server}->put(“HEAD /”);
#$_[HEAP]{client}->put( $tree->to_str );
ServerInput => sub {
my $input = $_[ARG0];
my $ntree = RBTree->new;
$ntree->from_str( $input );
$tree->merge( $ntree );
print “New data: ” . $tree->to_str . “\n”;

#$_[HEAP]{client}->put( $client_input );

Practical Considerations

There are 6.7 x 10^9 human beings in the world, and thus 49 x 10^18 integers are sufficient to specify pairs uniquely, therefore 64 x (1024)^6 integers are sufficient to speficy pairs uniquely, because 1000 = 10^3 < 2^10 = 1024.  But 64 x (1024)^6 = 2^66, and therefore 66 bits of data are sufficient to specify pairs of  humans uniquely if we order humans from 1 to 6.7 x 10^9.

Now abstractly, we could transfer data about pairs of humans in communication from one server to another by raw data of bits.  Assume that each were encoded by 66 bits.  How much could be reasonably transmitted in a second?  We have to make some assumptions about bandwidth of connections.  Even in relatively ‘underdeveloped’ parts of the world, a reasonable assumption is that the servers are connected by ADSL-speed connections.  These have speeds of around 1.544 Mbits/sec,  which can fit at least 23,000 contiguous blocks of 66 bits per second.

Therefore, to the extent we are able to write software that approximates no need for more than around half that bandwidth for synchronizing, let’s say around 10,000 changes per second specified by 66-bit blocks at most, then we can have a viable internet-based system for one-to-one translate chat that can be used by 6.7 billion simultaneously.


We assume communication is one-to-one at any given instant of time.

We assume also that an agent can accept a delay of a short amount (say 0.5 seconds) when switching communication partners.

We are interested in a system where 6.7 billion can communicate with each other real time.

We assume that each agent has a preferred language of communication and cannot communicate in any other language.

Label each person by a numeral from 1 to 6.7 billion.

At any instant in time, consider a vector whose values is the label of the communication partner.  This vector represents one-to-one communication in the entire planet.

An auxiliary vector of length 6.7 billion tracks the preferred language of every person.

At any instant, at most 6.7 billion “connections” are available, with fixed language on either end.  An automated language translator can translate all communication by the agents.

A simple prototype of an actual web-based translator is perl’s WWW::Babelfish module.

Based on this back-end communication scheme one can develop a front-end that is Facebook-like but has no restrictions on number of friends.  Translation can be used to translate not only the real-time chat but entire status pages.  The restriction on number of friends is removed for the purpose of one-to-one chat.

The basic internet chat mechanism is Internet Relay Chat (IRC) developed in 1993 by the Network Working Group.  It is a teleconferencing protocol using TCP/IP.  There are several perl modules available from CPAN:



POE (Perl Object Environment) is a suite of libraries for quick network application development whose multitasking capabilities are event-driven rather than based on forked processes and threads. This is perfect for my needs, which is to prototype a system for translation-chat among 6.7 billion clients simultaneously.  A whitepaper describing POE can be found here.  Sample code that uses POE for a simple chat server can be found here.  In July 2004 Sungo presented two articles giving an introduction to POE with part one and part two.

It is exceedingly simple to track the switchboard of communication between 6.7 billion clients in a vector of length 6.7 billion but the problem is the mechanism to handle the communication with a server that could handle the updating such a “switchboard” vector.

IRC with “blocking servers” that each keep a vector of length 6.7 billion. If a local client sends a message to someone it is talking to, message goes through, otherwise the server “broadcasts” change of conversation partner, waits for response and THEN lets the message through.

Intermediate Step

Just so we have a useful example of working code using the event-driven mechanism of POE, here is a program that uses POE::Wheel::ReadLine to translate sentences typed in English into French.  The interest in the program is the use of the POE infrastructure to communicate with the terminal and keyboard.


use warnings;
use strict;

use POE qw(Wheel::ReadLine);
use WWW::Babelfish;

my $s = WWW::Babelfish->new(
service => ‘Yahoo’,
agent => ‘Mozilla/8.0’
die “Babelfish service unavailable\n” unless defined $s;

inline_states=> {
_start => \&setup_console,
got_user_input => \&handle_user_input,


sub handle_user_input {
my ($input, $exception) = @_[ARG0, ARG1];
my $console = $_[HEAP]{console};

unless (defined $input) {
$console->put( “$exception caught.”);
$_[KERNEL]->signal($_[KERNEL], “UIDESTROY” );

my $tr_sentence = $s->translate(
source => ‘English’,
destination => ‘French’,
text => $input,

$console->put(”  Translation: $tr_sentence”);
$console->addhistory( $input );
$console->get(“Go: “);

sub setup_console {
$_[HEAP]{console} = POE::Wheel::ReadLine->new(
InputEvent => ‘got_user_input’
“Enter some text.”,
“Ctrl-C or Ctrl-D exits.”
$_[HEAP]{console}->get(“Go: “);

Further Refinements of the fundamental problem

Suppose a network consists of client nodes and server nodes just as in the IRC model, and suppose furthermore that the server nodes have a fixed ‘state variable’ that tracks which client is talking to whom, and this state variable is fixed.  Then there is no real traffic problem except that which arises from the organization of the client-server network, i.e. if a large number of clients are connected to a single network, for example, then there is an inherent bottleneck for communication, but message traffic is tractable if there is fixed one-to-one communication.

Thus the problem reduces to that of updating the changes in communication partners in some reasonable granularity in time.  Regardless of the network topology, the solution of the problem of updating the network’s knowledge of who is talking to whom in an efficient manner makes the idea of 6.7 billion having ability for translate-chatting with anyone feasible.

The basic idea for the solution to this problem I have is that while chatting is asynchronous, if servers could broadcast the change in state in a periodic fashion to the other servers notifying them of all the changes it becomes aware of and these are highly compressed and only changes the state variable for those one-to-one connections for which it gets “confirmations”, then the problem can be resolved.  This is feasible since one expects the number of places of a vector of length 6.7 billion that will have changes at any point in time to be extremely sparse from a single server, and thus the change signals broadcast to the network for changes will be very short messages (compared to the entire length of the state vector).

Practical problems and need for compromise

If the RAM of an average computer exceeded, say, 100 gigabytes, then a state vector of length 6.7 billion would not be problematic, for it is in the order of 6.7 GB.  A vector kept in a contiguous block of memory of such a size would have constant cost in access and changes.  Unfortunately, the average computer today cannot be assumed to have more than about 1-2 gigabytes of RAM. It is not problematic to assume that a file of size O( 6.7 GB ) is kept in the filesystem by a server program, but it is unreasonable to maintain such a state vector in RAM.  Therefore, a compromise is necessary if we are to implement the system proposed above.

The compromise is to assume that a server can track data in a balanced tree instead of a large vector.  A balanced tree has reasonably fast access, in the order of log(N).  Basic information about B-trees can be found here.  Perl has pre-written B-tree implementations that one needs to analyze for the speed of merges.  Essentially, a server keeping infomation about who is talking to whom (among 6.7 billion) can both maintain and send information about changes to this state in a B-tree structure, and send this information to other servers which can merge B-trees in memory.  A separate “garbage-collection type” sweep can synchronize the in-memory B-trees with a state vector in the filesystem.

The problem of one-to-one translate chat among 6.7 billion boils down to being able to do the following: two servers A and B each hold a B+ tree in memory; A must be able to serialize the tree and pass the information to B which must be able to read a B+ tree from the message and merge it with the tree it is holding.

There exists a B+ tree implementation called Tree::BPTree in CPAN.  The performance is not great but the interface is clean.  I tested this for use.  POE::Component::IRC or TCP could be used for messaging with contents of a B+ tree.

We need to prototype something basic: a number of servers each of which tracks a B+ tree containing information about who is talking to whom where changes are quickly notified — and it seems reasonable to use POE::Wheel::ReadWrite to communicate via sockets.

Code example for the POE::Wheel::ReadWrite gives a sense of how this can be accomplished:

use IO::Socket::INET;

use POE qw(Wheel::ReadWrite);


inline_states => {

_start => sub {

$_[HEAP]{client} = POE::Wheel::ReadWrite->new(

Handle => IO::Socket::INET->new(

PeerHost => ‘www.yahoo.com’,

PeerPort => 80,


InputEvent => ‘on_remote_data’,

ErrorEvent => ‘on_remote_fail’,


print “Connected.  Sending request… \n”;

$_[HEAP]{client}->put( “GET / HTTP/0.9”, “Host: http://www.yahoo.com”. “”);


on_remote_data => sub {

print “Received: $_[ARG0]\n”;


on_remote_fail => sub {

print “Connection failed. Shutting down…”;

delete $_[HEAP]{client};






Read Full Post »