RE: the out-of-Africa-losing-consonants theory, there is a major problem in that languages gain and lose complexity all the time according to, as best we can tell, random trends. Aside from the examples of complex sound systems pointed out already, how should we control for these random trends? It's possible that complexity in sound systems rises and falls over time and we have very little data for the vast vast majority of the human languages that have ever been spoken.
I have come across another theory that may be relevant: as a general rule, languages spoken by a small group of people tend more towards complex, inflected grammars and unique sounds, while larger languages tend more towards isolating grammars with simpler sounds. This is hypothesized to happen because people living in small groups already know each other very well and have a lot of shared background information so many of the referents in conversation can be easily inferred from inflections and context (think of how often you can be really vague and still be understood by your friends and family). Additionally, as smaller languages aren't generally learned often by outsiders, they have few pressures to become more grammatically transparent or pronounceable for the benefit of adult learners. Larger languages have generally gone through a period where they have become the language of hegemony in an extended area, and assimilated large hordes of people speaking other dialects or completely different languages and so have had some of the rough edges filed down, so to speak. Speakers of larger languages are more likely to run into strangers with strange accents and little shared background so more information must be spelled out, typically by relying on word order and auxiliary words for grammatical meaning. A paper analyzing grammatical features vs. population size of the speakers can be found here:
http://www.helsinki.fi/~ksinnema/Complexity_PopSize_web.pdf
It's also worth mentioning that creoles, whose whole reason for existence is to form a means of communication between people of very different backgrounds, tend to lack features that are obnoxious for outsiders to learn like complex systems of inflection, tones, and irregular forms.
As far as the OP goes, the fact that clicks aren't widespread may indicate that they are a characteristic of the Khoisan peoples' isolation and small number of speakers per language. The reconstructions of click sound evolution tend to point to them being originally generated from complex sequences of normal consonants, which means they may be a linguistic device, like inflection, that simplifies communication for people who have a common background but complicates it for people who don't. While some Bantu languages have gained clicks, some Khoisan languages have also lost them due to Bantu influence. This suggests to me that clicks are a feature that wouldn't travel well even as part of a larger, hegemonic language.