I will start with how the human voice works as it’s easier to understand and works in the same way a vocoder does. There are two main parts involved with speaking, one is the vocal chords and the other is our mouth (specifically the way our tongue and lips move). Our vocals chords really only make noise, this noise is unique to each and every person, we only have control over its volume by pushing more air past it or making it vibrate stronger. At this point you could call this a basic waveform. This waveform then travels past the mouth where (like a filter) we sculpt the waveform into the common sounds of the language you speak (using your tongue and lips).
A vocoder does pretty much the same thing; it uses a carrier signal and a modulator signal in place of the vocal chords and mouth. The carrier, which is commonly a synth, is the signal that will be modulated. The modulator is commonly speech, for the classic ‘robot talking’ but it can be any other signal source for whatever desired sound you are going for (this is fun to experiment with).
Getting more specific, the carrier goes through multiple band pass filters (I will stick to 8 for this guide). They are used to split the signal into 8 frequency regions before the modulation circuit. The ‘modulator’ or ‘voice’ (I will refer to it as ‘voice’ for the guide) goes through another set of 8 band pass filters, each separating the voice into frequency groups like the carrier. Each frequency group of the voice then goes into an envelope follower; this extracts the amplitude voltage of each frequency group (as a whole) individually. The carrier, in its separated frequency groups, then goes into a series of VCA’s (Voltage Controlled Amplifiers), one for each filter group. These 8 VCA’s are ‘amp’ modulated by the 8 envelope followers, this means whatever amplitude in the voice in the lowest frequency will be imposed as the amplitude of the carrier in the lowest frequency. The same happens in every filter channel.
After the carrier has been modulated by the voice those 8 filters groups get summed back into one channel of audio (or two channels if you are using it in stereo), this summed channel is the robot voice created by a vocoder.
More frequency bands will give you more frequency separation which results in more clarity. 8 Bands was the common amount in the analog vocoders of the 70’s/80’s, some were capable of 16 however.
For those that can read schematics here's a simple vocoder schematic.

Now going back to my ‘reverb doesn’t lessen clarity when used on the carrier signal’ argument I hope you can see that it does not create a time based effect in the same way it traditionally would. The signal created by the reverb would only add stereo phasing noise to be amp modulated by the voice. It would add space by creating phase differences in the stereo channels but it would not create a lasting time delay effect because its amplitude signal would be modulated by the voice. This is of course if you have the right settings on the reverb, I use it to spread out the high range only and add noise to that region. Short decay time with a moderate room size.