A Startup A Day

Entries from January 2010

How To Encode an MP3 Using Twitter

January 24, 2010 · 3 Comments

headphones

I should probably put a massive disclaimer on this post noting that this is just a crackpot idea I came up with, I haven’t actually tested this (nor do I plan to) and there are probably a ton of real-world issues with this approach, both technical as well as legal.  So, yeah – don’t try this at home.

A simplistic refresher on compression

(Editors note: as my very smart commenters have pointed out, actually *compressing* data using Twitter doesn’t make any sense.  This post is about *encoding* data using Twitter.  This part is just a refresh on the basic concepts, feel free to skip ahead if you already know this stuff.)

It’s been a couple of years since my computer science days, but I do remember the basics of an simple compression algorithm.  First, remember that all digital files are made up of a series of 1’s and 0’s.  Let’s take this pattern as an example:

0010 0100 0001 0000 0100 0001 001

The key to this compression algorithm is to find long repeating patterns, and use a lookup table to replace them with shorter ones.  Let’s take another look at our example, with spaces inserted to highlight the patterns:

001 001 000001 000001 000001 001

Now, let’s make a quick lookup table:

Pattern Replacement
001 A
000001 B

 

We can now represent those bits as follows:

AABBBA

And there we go!  Our 27 characters were replaced with 6 characters, and even adding in the extra space for the lookup table, it’s still significantly smaller. 

Of course, this is just for demo purposes only – visit your local library to learn how the big boys do it.

So what does this have to do with Twitter?

Typically, compression algorithms have been used to transform a fixed amount of local information into a smaller amount of local information.  However, Twitter gives use something entirely different: a non-local store of information.  By encoding our local information into Twitter’s non-local store, we can massively decrease the amount of local information we need to keep track of. 

Let’s take a look at our old example lookup table:

Pattern Replacement
001 A
000001 B

 

Now, let’s add a new column, where instead of the replacement being a letter, let’s pull in two random Tweets from the public timeline:

Pattern Replacement Tweet
001 A I am bored
000001 B Good morning!

 

Using the same approach as before, let’s encode the original bits using our new Tweet replacement scheme:

AABBBA –> I am boredIamboredGood morning!Good morning!Good morning!Iambored

Wow…that really kinda sucks.  But, stick with me here for a second.  There’s an important point that we’re missing here, and that is the fact that Twitter assigns a sequential ID to each Tweet.  Let’s pretend for a second that the public timeline looked like this:

ID Tweet
10000 I am bored
10001 I am bored
10002 Good morning!
10003 Good morning!
10004 Good morning!
10005 I am bored

 

Aha!  Now we can do something interesting.  Instead of replacing the letters with Tweets, we can instead assign a starting ID, and the number of Tweets to read:

AABBBA –> 10000-6

In other words, start at Tweet ID# 10000, and read 6 sequential Tweets.  With our simple example, it doesn’t look like you’re saving much space, but hypothetically you could replace thousands and thousands of 1’s and 0’s with a starting ID and the number of Tweets to read.

Kevin – you are an idiot.  The odds of that exact pattern coming up are a bazillion to one!

I know what you’re thinking, and you’re right.  I haven’t done the math, but I’m guessing the probably of a series of random Tweets appearing in sequential order that exactly matches the pattern we need for encoding is a number very close to zero.  But wait, there’s more!

Remember, the beauty of Twitter is that each Tweet has a large amount of information attached to it.  We can use as much or as little of that info to greatly increase the probability of finding a matching pattern.  For example, let’s take our table, but instead of using the full text of the Tweet, we only use the first letter of each Tweet:

ID First letter of Tweet
10000 I
10001 I
10002 G
10003 G
10004 G
10005 I

 

Or, let’s look at the number of letters in each Tweet:

ID Length of Tweet
10000 10
10001 10
10002 13
10003 13
10004 13
10005 10

 

And these are just patterns I’m using as examples to make it easy for us humans to read.  A computer armed with a neural net could easily discover many more patterns.

And remember, there are a massive number of Tweets created each day.  I just pulled the latest ID from the Twitter public timeline, and assuming they are created in sequential order, that means there are a total of 8,156,003,416 Tweets that have been created!  With that much information, the odds of being able to find a matching pattern suddenly become much more reasonable.

There are other ways to lower the odds as well.  For example, you wouldn’t necessarily need to find an exact pattern that is thousands of Tweets long.  Let’s say that you needed to find a pattern that is 50 Tweets.  You could break it up into smaller chunks, like this:

10000-6

20010-19

33990-25

Now instead of 50 Tweets in a row, we only needed to find a sequence of 6 Tweets, another sequence of 19 Tweets, and a third sequence of 25 Tweets.

And remember – Twitter is pumping out more and more information each day.  This dynamic quality to Twitter means that a new pattern could be discovered that would replace an old pattern.  For example, maybe for the above example, a new pattern was found that contained all 50 Tweets in one sequence!  You could then replace the original encoding:

10000-6

20010-19

33990-25

…with this encoding:

510000-50

So what?

OK – so theoretically it’s possible to encode bits by using patterns found in Twitter.  We already have .mp3 files to play songs, why do we need this?

#1 – Instead of having 10,000 3 MB files filling up your hard drive, you could fit your entire music collection into one text file that is 10,000 lines long.

#2 – There’s another very, very good reason for encoding music (or other types of media) like this…but let’s just say that the RIAA would probably not be terribly happy about an approach like this.  ‘Cause it’s pretty hard to sue people for distributing music if all they are doing is talking about what they are eating for lunch on a massively popular public site.

What do you guys think of this idea?  Completely crazy?  Or something that might actually be possible (especially once Twitter opens up their API firehose)?  Let me know in the comments or follow me on Twitter at @astartupaday.

Categories: Uncategorized

Kevin’s Answers

January 13, 2010 · 1 Comment

googleanswers

This image might just have the highest picture-to-idea-relevance percentage of all time.  Bless you, Flickr (and your benevolent Yahoolian overloads).

Tangent #1:

Craig’s List.  Judy’s Book.  Angie’s List.  Tom’s Hardware. 

Adding a first name to your website’s name is a very effective tactic, especially in a situation with the following criteria:

  • The site is geared towards helping a user make a decision
  • Trust/authenticity is an important aspect of the decision-making process
  • A lo-fi solution is “good enough”, and in fact may be a better overall experience than expensive, flashy sites

Tangent #2:

In case you’ve been living under a rock, Google recently launched a new phone called the Nexus One.  One of the more intriguing aspects of the phone is the release of a new Android feature that adds voice-to-text transcription to any text field.  I fully expect this feature to be standard fare for modern smartphone OSs within the next year.

One important implication of this change is the formatting of the average search query.  Instead of using a series of keywords as a search query (i.e. “Chinese Restaurant Seattle Yelp”), users who are speaking are more likely to ask direct questions as opposed to playing keyword bingo (i.e. “Where is the best Chinese restaurant within five miles?”) 

Tangent #3:

If you haven’t yet read the mind-bending Wired article on a company called Demand Media (and Mike Arrington’s poignant response), you really ought to.  As a passionate wantrepreneur, I’m incredibly torn by the rise of “Fast Food Content” movement.  On one hand, I hate to see thoughtful, hand-crafted content get overrun by lowest-common-denominator drivel.  On the other hand, I can’t help but get excited for the potential business opportunities this new model could unlock.

The Idea:

So with the devil seated firmly on my shoulder, I offer up today’s idea: Kevin’s Answers.  The idea is a very simple site that provides short, specific, and direct answers to the most popular questions currently being asked on search engines. 

While algorithms could be employed to determine the right questions and to help with the answers, one aspect that would set this site apart would be the fact that every single question on the site would be answered/approved by the owner of the site (that’s me!).  Hence, the “Kevin” in “Kevin’s List”. 

Would this take an extremely long time to build?  Of course.  But that’s the competitive advantage.  The key here is that the site would be so simple to set up that any development time would be replaced by hours upon hours of researching and answering questions. The site would be ad-supported, and as time went on, the unique brand and difficult-to-compile content would potentially become an acquisition target for a search company looking to improve their semantic search capabilities.

What do you think of this idea?  Would love to hear thoughts on this one.  And if you want to know what I’m eating for lunch every day (spaghetti with meatballs!), you can follow me on Twitter here: @astartupaday

Categories: Uncategorized

2010, Micropayments, and You

January 13, 2010 · 1 Comment

penny

“It’s tough to make predictions, especially about the future.” – Yogi Berra

A few weeks ago I sat down to join my blogging brethren for the obligatory “My Predictions for 2010” post.  However, after seeing the strain that the massive influx of self-importantism was putting on the Internet as a whole, I decided to scrap my post and wait for the dust to settle a bit before adding my two cents.

That, and I was too lazy to finish the post before leaving for my holiday break.  Which was amazing, by the way, thanks for asking.

After looking at my predictions for the sweeping waves of change that were bearing down upon the new year, I decided to take a step back and focus on the one big prediction that I think will have a major impact on both the company in question and the startup ecosystem as a whole:

Facebook will successfully launch a micropayments system

Facebook has done a fairly good job making money to date, but in order to go public, they need to tap into something much more significant.  One big opportunity they have in front of them is to roll out a payment system that will compete head-to-head with PayPal.  It will enable trusted payments between people in your social graph, 1-click payments on sites that have Facebook Connect installed, and (eventually, and crucially) mobile payments via your phone. 

This will be Facebook’s “AdSense” moment

Just like AdSense paved the way for the ad-based revenue model that fueled the Web 2.0 movement, I believe that Facebook’s micropayment model will have the same effect on the web startup landscape in 2010 and 2011.  Here are a few examples of how this new model might work:

  • Online news sites can switch from a subscription model to a micropayment-per-article model.  This will especially help smaller niche news sites that don’t publish enough articles to justify a monthly subscription fee.
  • SaaS providers can offer a pay-per-use model.  For example, SlideShare could allow users to post 50 slides for free, and charge $.05/slide  for each additional upload.
  • Micropayments could be used as a quality bar to prevent spam for user generated content.  For example, if a popular blog could charge $.01/comment, it might be enough to make the “FIRST!!!!11!ONE” morons think twice before posting.
  • Q and A sites can use micropayments as incentive to get quality answers to questions.  For example, I might offer $.25 to someone who can tell me the name of the grammar robot that Lisa Simpson invented in S12E18: Trilogy of Error.

These are just a few quick examples off the top of my head, and as with any new innovation, the most interesting ones are yet to be discovered.

OK, so there’s my big prediction.  Keep an eye out on April 21-22, at Facebook’s annual dev conference, to see if/how this one is going to pan out. 

If you want the opportunity to laugh in my face on 1/1/11 when this one doesn’t come true, you should probably follow me on Twitter at @astartupaday.

Categories: Uncategorized