Thursday, October 29, 2015

[SD]TinyURL


What is a tinyurl? we can represent a long url with a 6-letter url by using alphabets [0-9, a-z, A-Z].
e.g., http://goo.gl/0wV1lU


1. Convert a long url to shorten url

f(long url ) = shorten url
E.g., f(www.weihungliu.com/myminelink) = http://goo.gl/cb
Normally, we store long url in our database, and it will have an corresponded id, which is auto_incremented
Hence, we can translate our problem into convert id to a shortened url

2. Convert id to shorten url
Suppose id is 125 = 1*10^2 + 2*10^1+ 5*10^0, it is 10-base
since we have 62 alphabets = 10(0-9)+26(a-z)+26(A-Z) = 62, so we can convert 10-base to 62-base
then 125 = 2*62^2+1*62^0
Why we want to do this ? 10-base to 62 base?
Since in this case we can represent 125 = 2*62^2+1*62^0 as a simple "cb", where 2 denotes c and 1 denotes b. This is called Rolling hash.

[table
0 → a 
1 → b ... 
25 → z ... 
52 → 0 
...
61 → 9]
this is where "cb" of http://goo.gl/cb come from.


 public String shorturl(int id, int base, HashMap map) {
  StringBuilder ret = new StringBuilder();
  while (id > 0) {
    int digit = id % base;
    ret.append( map.get(digit) );
    id /= base;
  }
  while ( res.length() < 6 )  ret.append('0');
  return ret.reverse().toString(); 
}
3. Convert shorten url to id

It's easy though.
http://goo.gl/cb
we convert c to 2 and b to 1 and use 62-base numeric then id become 2*62^1+1*62^0 = 125

4. Single machine VS Multiple machine 
The above approach is well suited for single machine, However, on Multiple machine we need to think the other way.

SOURCE FROM http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html
On Multiple Machine
Suppose the service gets more and more traffic and thus we need to distributed data onto multiple servers.

We can use Distributed Database. But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.).

Alternatively, we can use Distributed Key-Value Datastore.
Some distributed datastore (e.g. Amazon's Dynamo) uses Consistent Hashing to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input.

The basic process can be:
Insert
  1. Hash an input long url into a single integer;
  2. Locate a server on the ring and store the key--longUrl on the server;
  3. Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.
Retrieve
  1. Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
  2. Locate the server containing that key and return the longUrl.

5. References
http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener
http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html

No comments:

Post a Comment