What is a tinyurl? we can represent a long url with a 6-letter url by using alphabets [0-9, a-z, A-Z].
e.g., http://goo.gl/0wV1lU
1. Convert a long url to shorten url
f(long url ) = shorten url
E.g., f(www.weihungliu.com/myminelink) = http://goo.gl/cb
Normally, we store long url in our database, and it will have an corresponded id, which is auto_incremented
Hence, we can translate our problem into convert id to a shortened url
2. Convert id to shorten url
Suppose id is 125 = 1*10^2 + 2*10^1+ 5*10^0, it is 10-base
since we have 62 alphabets = 10(0-9)+26(a-z)+26(A-Z) = 62, so we can convert 10-base to 62-base
then 125 = 2*62^2+1*62^0
Why we want to do this ? 10-base to 62 base?
Since in this case we can represent 125 = 2*62^2+1*62^0 as a simple "cb", where 2 denotes c and 1 denotes b. This is called Rolling hash.
[table
0 → a 
1 → b
... 
25 → z
... 
52 → 0 
...
61 → 9]
61 → 9]
this is where "cb" of http://goo.gl/cb come from.
 public String shorturl(int id, int base, HashMap map) {
  StringBuilder ret = new StringBuilder();
  while (id > 0) {
    int digit = id % base;
    ret.append( map.get(digit) );
    id /= base;
  }
  while ( res.length() < 6 )  ret.append('0');
  return ret.reverse().toString(); 
}
3. Convert shorten url to id
It's easy though.
http://goo.gl/cb
we convert c to 2 and b to 1 and use 62-base numeric then id become 2*62^1+1*62^0 = 125
4. Single machine VS Multiple machine
The above approach is well suited for single machine, However, on Multiple machine we need to think the other way.
SOURCE FROM http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html
| 
On Multiple Machine 
Suppose the service gets more and more traffic and thus we need to distributed data onto multiple servers. 
We can use Distributed Database. But maintenance for such a db would be much more complicated (replicate data across servers, sync among servers to get a unique id, etc.). 
Some distributed datastore (e.g. Amazon's Dynamo) uses Consistent Hashing to hash servers and inputs into integers and locate the corresponding server using the hash value of the input. We can apply base conversion algorithm on the hash value of the input. 
The basic process can be: 
Insert 
 
Retrieve 
 | 
5. References
http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener
http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html
http://n00tc0d3r.blogspot.com/2013/09/big-data-tinyurl.html
 
No comments:
Post a Comment