In section 5.2, we discuss the source coding theorem. We first consider a source code, which consists of a source sequence X, an encoder, and a decoder. The encoder maps the random source sequence to an index f(X) in an index set script I which consists of the integers {1, 2, ..., M}. Such a code is called a block code with n being the block length of the code. The encoder sends f(X) to the decoder through a noiseless channel. Then based on the index received, the decoder outputs hat{X} as an estimate on X, called the reproduction sequence. The encoder is specified by the mapping f from script X^n to the index set script I. The rate of the code is given by R defined as (log M)/n in bits per source symbol, where M is the size of the index set and n is the block length. Here we assume that the base of the logarithm is equal to 2. If M, the size of the index set is equal to the number of all possible sequences, which is equal to |X|^n then the rate of the code is equal to 1/n log M equals 1/n log (|X|^n), which is equal to the log |X|. Typically we take the rate R to be strictly less than the log |X| for data compression. For such situations, because the number of indexes is less than the total number of source sequences, the decoder may not be able to reproduce the source sequence correctly. An error occurs if hat{X}, is not equal to X, and the probability that X hat is not equal to X, denoted by P_e, is called the error probability. The source coding theorem consists of two parts. First, the direct part says that, for arbitrarily small error probability, there exists a block code, whose coding rate is arbitrarily close to H(X), when n is sufficiently large. This part of the source coding theorem says that, reliable communication can be achieved, if the coding rate is at least equal to H(X). The converse part of the source coding theorem says the following. For any block code with block length n and coding rate less than H(X) - zeta where zeta is a positive quantity which does not change with n. Then the error probability tends to 1, as the block length tends to infinity. This part of the source coding theorem, says that, it is impossible to achieve reliable communication if the coding rate is less than H(X). To prove the direct part of the source coding theorem, for an arbitrarily small but fixed epsilon bigger then zero, we need to construct a sequence of codes with block length n, such that the error probability is less than epsilon when n is sufficiently large. We will consider a class of block codes with a particular structure. Decode to the unique x in A such that f(x) is equal to I. If the source sequence x is in A then it is decoded correctly. If the source sequence x is not in A then it is decoded incorrectly. Therefore, the error probability is equal to the probability that the random sequence is not in A. We now prove the direct part of the Source Coding Theorem. The main idea is that we need to choose the set A suitably. Fix epsilon greater than zero and take the set A be the set of all typical sequences W_{X epsilon}^n, and hence M, the size of the index set is equal to the size of A. Then for sufficiently large n, since M is equal to the size of A, and A is equal to the typical set, we have M equals the size of A equals the size of the typical set, lower bounded by one minus epsilon times 2 to the power n times entropy of X minus epsilon, and upper bounded by 2 to the power n times entropy of X plus epsilon. Therefore, taking the logarithm and dividing by n, the coding rate one over n times log M, is lower bounded by one over n times log one minus epsilon plus entropy of X minus epsilon, and upper bounded by entropy of X plus epsilon. Again, by the weak AEP, the probability of error, that is the probability that X sequence is not in the set A, which is equal to the probability that the sequence X is not typical, is less than epsilon. Letting epsilon goes to zero, this goes to zero, this goes to zero, this goes to zero, and this goes to zero, we see that the coding rate one over n times log M tends to H(X), while P_e tends to zero. For the converse coding theorem, we will prove it for the class of block codes we use for proving the direct part. For a general converse, please see problem two in the textbook. We now prove the converse part of the source coding theorem. First, consider any block code whose rate is less than entropy of X minus zeta, that is one over n log M is less than entropy of X minus zeta, where zeta greater than zero does not change with n. Then the total number of codewords M is less than or equal to 2 to the power n times entropy of X minus zeta. In general, some of the indices in I cover x sequences that are typical; this is highlighted in blue; while the others cover x sequences that are not typical; this is highlighted in green. By the weak AEP, the total probability of typical sequences covered, that is the blue part, is upper bounded by the number of such sequences, which is less than or equal to 2 to the power n times entropy of X minus zeta, times the probability of each sequence, which is upper bounded by 2 to the power minus n times entropy of X minus epsilon. This gives 2 to the power minus n times zeta minus epsilon, where this H(X) and this H(X) cancel with each other. The total probability covered by the index set I, that is the probability that the X sequence is in the set A, is upper bounded by the blue part, whose probability is at most 2 to the power minus n times zeta minus epsilon, plus the probability that X is not typical, that is the green part, which by the weak AEP, is less than epsilon. Therefore, we obtain the upper bound 2 to the power minus n times zeta minus epsilon plus epsilon, on the total probability covered by the index set I. Then the probability of error, that is the probability that X is not in A, is greater than one minus 2 to the power minus n times zeta minus epsilon plus epsilon, where this lower bound on P_e holds for any epsilon greater than zero and n sufficiently large. By taking epsilon less than zeta, we have zeta minus epsilon being positive, so that when n is sufficiently large, 2 to the power minus n times zeta minus epsilon is less than epsilon. Note that this choice of n depends on the value of epsilon. Therefore, for n sufficiently large, we have the probability of error greater than one minus two epsilon. Finally, we let epsilon tends to zero, to finish the proof.