<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: How To Encode an MP3 Using Twitter</title>
	<atom:link href="http://astartupaday.com/2010/01/24/how-to-encode-an-mp3-using-twitter/feed/" rel="self" type="application/rss+xml" />
	<link>http://astartupaday.com/2010/01/24/how-to-encode-an-mp3-using-twitter/</link>
	<description>Each day I&#039;ll post an idea for a new Web 2.0 startup</description>
	<lastBuildDate>Mon, 30 Aug 2010 21:14:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: astartupaday</title>
		<link>http://astartupaday.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4073</link>
		<dc:creator>astartupaday</dc:creator>
		<pubDate>Tue, 26 Jan 2010 21:44:48 +0000</pubDate>
		<guid isPermaLink="false">http://astartupaday.wordpress.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4073</guid>
		<description>Hey Reinier,

This is why I love my blog - smart people taking time out of their day to debunk my crazy ideas!  One important thing to note, and I should have clarified this in my post, is that by no means was this intended as a *compression* algorithm- this is an *encoding* algorithm.  

The metaphor I should have used is more like this.  Think about encoding a secret message in a book by using the first letter of each paragraph.  To encode the message &quot;This idea is crazy&quot;, I would need to find a page in a book where the first paragraph starts with &quot;T&quot;, the second starts with &quot;H&quot;, and so on...

Now, if I want to send this message to someone, I just need to tell them where in the book to start. (i.e. War and Peace, page 217, paragraph 3).  While the full book of &#039;War and Peace&#039; is much, much larger than my message, I can send my message just by giving these simple instructions.

In my mind, where this idea breaks down is this question:

What is the mathmatical probability that for longer messages you could find a book that has the exact letters laid out in sequential paragraphs?  

For an arbitrarily small input (a single letter, for example) the probability is 1.  For an arbitrarily large input (20 billion letters), the probability is very close to 0.  The real question is, does that probability drop fast enough that encoding anything useful is not practical?  

My guess?  If you actually did the math, the probability of doing anything useful would likely be very, very small.  Small enough to make this whole debate moot.  

Your legal argument also probably holds water with most courts - which is why this becomes more of a hypothetical theory rather than something any sane person would set out to do.  :)</description>
		<content:encoded><![CDATA[<p>Hey Reinier,</p>
<p>This is why I love my blog &#8211; smart people taking time out of their day to debunk my crazy ideas!  One important thing to note, and I should have clarified this in my post, is that by no means was this intended as a *compression* algorithm- this is an *encoding* algorithm.  </p>
<p>The metaphor I should have used is more like this.  Think about encoding a secret message in a book by using the first letter of each paragraph.  To encode the message &#8220;This idea is crazy&#8221;, I would need to find a page in a book where the first paragraph starts with &#8220;T&#8221;, the second starts with &#8220;H&#8221;, and so on&#8230;</p>
<p>Now, if I want to send this message to someone, I just need to tell them where in the book to start. (i.e. War and Peace, page 217, paragraph 3).  While the full book of &#8216;War and Peace&#8217; is much, much larger than my message, I can send my message just by giving these simple instructions.</p>
<p>In my mind, where this idea breaks down is this question:</p>
<p>What is the mathmatical probability that for longer messages you could find a book that has the exact letters laid out in sequential paragraphs?  </p>
<p>For an arbitrarily small input (a single letter, for example) the probability is 1.  For an arbitrarily large input (20 billion letters), the probability is very close to 0.  The real question is, does that probability drop fast enough that encoding anything useful is not practical?  </p>
<p>My guess?  If you actually did the math, the probability of doing anything useful would likely be very, very small.  Small enough to make this whole debate moot.  </p>
<p>Your legal argument also probably holds water with most courts &#8211; which is why this becomes more of a hypothetical theory rather than something any sane person would set out to do.  <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ivo</title>
		<link>http://astartupaday.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4072</link>
		<dc:creator>Ivo</dc:creator>
		<pubDate>Tue, 26 Jan 2010 21:25:58 +0000</pubDate>
		<guid isPermaLink="false">http://astartupaday.wordpress.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4072</guid>
		<description>&quot;I haven’t done the math&quot;

And if you would, you would find out that, on average, you&#039;d have download 3 MB in tweets to uniquely represent a 3MB mp3 file. 

Secondly, you&#039;d have to search an infinite time to find a tweet sequence representing your 3MB file.</description>
		<content:encoded><![CDATA[<p>&#8220;I haven’t done the math&#8221;</p>
<p>And if you would, you would find out that, on average, you&#8217;d have download 3 MB in tweets to uniquely represent a 3MB mp3 file. </p>
<p>Secondly, you&#8217;d have to search an infinite time to find a tweet sequence representing your 3MB file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reinier Zwitserloot</title>
		<link>http://astartupaday.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4071</link>
		<dc:creator>Reinier Zwitserloot</dc:creator>
		<pubDate>Tue, 26 Jan 2010 17:06:46 +0000</pubDate>
		<guid isPermaLink="false">http://astartupaday.wordpress.com/2010/01/24/how-to-encode-an-mp3-using-twitter/#comment-4071</guid>
		<description>*COMPLETELY CRAZY*. In fact, I can mathematically prove to you that your concept cannot possibly work, at all, ever.

If you want to look into it beyond this comment, I suggest you lookup the &quot;Pigeon Hole Principle&quot;, as well as find any nutcase compression algorithm based on finding any piece of data&#039;s numerical representation in the processing of the digits of pi (which is mathematically proven to contain every number provided you expand its digits long enough). You run into the exact same issue you are going to run into.

Simply put:

You can&#039;t use X bits to represent more than at most 2^X different cases.

Let&#039;s set X to 8 for a moment. If I have a compression algorithm that can take *any* data that is 9 bits long and reduce it to 8 bits, then we can clearly say this compression algorithm is as impossible as 2+2 being equal to 3. After all, the compressed data has 2^8 = 256 different &#039;pigeon holes&#039;, but you&#039;ve got 2^9 = 512 different data files. You can&#039;t fit 2 pigeons in one hole, or your compression scheme would be arbitrary and a compressed file can be decompressed into 2 different files, which would make it useless.

The key, then, is to prove that a compression scheme cannot compress random data just as well as non-random data, for if you can prove that, you know the pigeon hole principle will apply and you haven&#039;t got a compression scheme at all, just one to reshuffle bits. And, usually, this mapping function actually serves to make the &#039;compressed&#039; data *BIGGER* than the original, as most of these mapping schemes, be it twitter or the digits of pi, contain redundancy.

Something like the scheme you started this post out with can be proven to spectacularly fail on random data. It specifically exploits repetition. Data with no repetition at all should not compress (in fact, to accomodate the table, it would become LARGER. This is a key factor in avoiding the pigeon hole principle - this HAS to be true for any compression algorithm, just like 2 + 2 has to be 4).


In regards to the RIAA trick: That won&#039;t fly either, and there are far easier ways to accomplish this.

Example scheme: Take a movie. XOR it with the compressed complete repository of debian linux. Without the debian linux kernel this data is utter random jibberjabber - in fact, you can take the same seemingly random stream of data and, provided you XOR it with the right stream of other data, get anything you want with that length. And yet, with one particular secondary input source (debian linux repo, which is free and open source), it &#039;magically&#039; turns into some copyrighted movie.

It&#039;s been tried. It&#039;s sufficiently nuanced that a judge will either not understand it and go back to the simple lemma that by downloading this thing you can trivially get the copyrighted movie out of it, and even if a judge does understand the math involved, he&#039;ll see it as a violation of the spirit of the law, which in cases like this is more than enough.

Hence, we arrive back at where I started: This entire post was *COMPLETELY CRAZY*.</description>
		<content:encoded><![CDATA[<p>*COMPLETELY CRAZY*. In fact, I can mathematically prove to you that your concept cannot possibly work, at all, ever.</p>
<p>If you want to look into it beyond this comment, I suggest you lookup the &#8220;Pigeon Hole Principle&#8221;, as well as find any nutcase compression algorithm based on finding any piece of data&#8217;s numerical representation in the processing of the digits of pi (which is mathematically proven to contain every number provided you expand its digits long enough). You run into the exact same issue you are going to run into.</p>
<p>Simply put:</p>
<p>You can&#8217;t use X bits to represent more than at most 2^X different cases.</p>
<p>Let&#8217;s set X to 8 for a moment. If I have a compression algorithm that can take *any* data that is 9 bits long and reduce it to 8 bits, then we can clearly say this compression algorithm is as impossible as 2+2 being equal to 3. After all, the compressed data has 2^8 = 256 different &#8216;pigeon holes&#8217;, but you&#8217;ve got 2^9 = 512 different data files. You can&#8217;t fit 2 pigeons in one hole, or your compression scheme would be arbitrary and a compressed file can be decompressed into 2 different files, which would make it useless.</p>
<p>The key, then, is to prove that a compression scheme cannot compress random data just as well as non-random data, for if you can prove that, you know the pigeon hole principle will apply and you haven&#8217;t got a compression scheme at all, just one to reshuffle bits. And, usually, this mapping function actually serves to make the &#8216;compressed&#8217; data *BIGGER* than the original, as most of these mapping schemes, be it twitter or the digits of pi, contain redundancy.</p>
<p>Something like the scheme you started this post out with can be proven to spectacularly fail on random data. It specifically exploits repetition. Data with no repetition at all should not compress (in fact, to accomodate the table, it would become LARGER. This is a key factor in avoiding the pigeon hole principle &#8211; this HAS to be true for any compression algorithm, just like 2 + 2 has to be 4).</p>
<p>In regards to the RIAA trick: That won&#8217;t fly either, and there are far easier ways to accomplish this.</p>
<p>Example scheme: Take a movie. XOR it with the compressed complete repository of debian linux. Without the debian linux kernel this data is utter random jibberjabber &#8211; in fact, you can take the same seemingly random stream of data and, provided you XOR it with the right stream of other data, get anything you want with that length. And yet, with one particular secondary input source (debian linux repo, which is free and open source), it &#8216;magically&#8217; turns into some copyrighted movie.</p>
<p>It&#8217;s been tried. It&#8217;s sufficiently nuanced that a judge will either not understand it and go back to the simple lemma that by downloading this thing you can trivially get the copyrighted movie out of it, and even if a judge does understand the math involved, he&#8217;ll see it as a violation of the spirit of the law, which in cases like this is more than enough.</p>
<p>Hence, we arrive back at where I started: This entire post was *COMPLETELY CRAZY*.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
