We can represent a system as an ensemble of pure states $\{p_i, \psi_i\}$. Now, if you have exact knowledge of the system then it is for sure in a pure state, i.e., $\rho = |\psi\rangle\langle\psi|$. However, if we have classical uncertainty amongst the possible states. The system can be represented as a mixed state $\rho = \sum_{i} p_i\psi_i$ where the probabilities $p_i$ are classical in nature.
This formulation helps us a lot in dealing with quantum information, noisy systems and helps us represent measurements better, as well. Why? Because it provides a convenient means for describing quantum systems whose state is not completely known.
Probability that upon measurement the outcome is $m = p(m) = tr(M_m^\dagger M_m\rho)$ and the state of the system becomes as follows.
Measurement operators also follow the completeness equation, $\sum_m M_m^\dagger M_m = I$.
Theorem 1: An operator $\rho$ is a density operator if and only if it is both positive $(\rho = \rho^\dagger)$ and $tr(\rho) = 1$.
Converse is easy to prove. If an operator is both positive and has trace as one, then it shall have a spectral decomposition of the form $\sum_i\lambda_i|i\rangle\langle i|$.
For the direct proof, let us consider $\rho = \sum_i p_i tr(|\psi_i\rangle\langle\psi_i|) = 1$.
Theorem 2: The sets $|\tilde\psi\rangle$ and $|\tilde\phi\rangle$ generate the same density matrix if and only if,
where $u_{ij}$ is a unitary matrix of complex numbers, with indices $i$ and $j$, and we 'pad' whichever set of vectors $|\tilde\psi_i\rangle$ or $|\tilde\phi_j\rangle$ is smaller with additional vectors $0$ so that the two sets have the same number of elements.
Theorem 3: If $\rho$ is a density operator, then $\rho$ is a pure state if and only if $tr(\rho^2) = 1$ and mixed state if and only if $tr(\rho^2) < 1$.
Theorem 4: Observable $M$ has expectation $\sum_x \langle \psi_x|M|\psi_x\rangle = tr(M\rho)$.
This is the single-most important application of density operator formulation is the existence of reduced density operator. It is defined as follows.
This allows us to talk about sub-systems of a composite system.
Suppose $|\psi\rangle$ is a pure state of a composite system, $AB$. Then, there exists orthonormal states $|i_A\rangle$ for system $A$, and orthonormal states $|i_B\rangle$ of system $B$ such that
where $\lambda_i$ are non-negative real numbers satisfying $\sum_i \lambda_i^2 = 1$ known as Schmidt co-efficients.
Suppose we have a mixed state $\rho_A$ for a system $A$. Then, we can introduce another system $R$ such that $AR$ forms a pure state $|AR\rangle$ and $\rho_A = tr_R(|AR\rangle\langle AR|)$.
Given $\rho_A = \sum_i p_i |i_A\rangle\langle i_A|$, we shall have the following where $ |i_R\rangle$ are orthonormal basis states.