I have two answers. First, why a sudden movement in a speaker cone "sounds like" a click. Second, how a series capacitor affects the movement of the speaker.
Assuming your speaker is directly connected to the output of the 555 (without a coupling capacitor) (which I hope is not the case), each time the 555 output changes state, the speaker cone moves rapidly from one steady position to another steady position, as others have described already in this thread. A positive voltage makes it jump outwards, and when the voltage disappears, it jumps inwards. (Or the opposite, depending on which way the speaker is connected.) Each one of those changes in the cone's position sounds like a click.
Why does it sound like a click? Why does it seem to have a certain duration? It's because of a general phenomenon called high-pass filtering, where high frequencies are passed, and low frequencies are attenuated.
When you listen to a loudspeaker, the sound is passed from the cone to your ear by movement of air molecules. There is an inherent high-pass effect in this coupling. If a speaker cone is reproducing a 1 kHz tone, it makes the neighbouring air molecules vibrate forwards and backwards quickly, and this vibration is coupled into other air molecules, and the sound tends to travel outwards from the speaker, until it reaches your ear and you hear it as a tone.
But if that speaker cone was moving smoothly at, say, 1 Hz (that's once in and out every second), smoothly with no jumps, the air that moves in response to it is moving comparatively slowly, and the movement dissipates easily into the surrounding air. If you're a metre away from the speaker, for example, little if any air movement is present at your ear - it has all dissipated into the surrounding air, because the air molecules were moving so slowly.
(Your ear doesn't respond well to very frequencies below around 20 Hz and your brain doesn't interpret them as sound, but that's beside the point for this explanation.)
At the instant when the speaker cone jumps from its rest position to a new position, air moves quickly and that movement travels to your ear, where you hear it. But the cone stops moving, so effectively there is just one little burst of air pressure, and the pressure evens out again after a short time. This sounds like a click or a thud, depending on how long it takes for the air pressure to flatten out again.
That is essentially why a speaker cone jumping from one position to another sounds like a click.
You may have used headphones that have direct coupling into your ear canal, or at least sit very close to it. These can have amazingly strong low-frequency response because they are so tightly coupled to your eardrum. If you connect one of those to a battery, through a resistor to limit the current to avoid damage to the headphone and to your eardrum, you will hear a "click" in its true glory. It sounds more like a thud or a bang, and is accompanied by a feeling of physical pressure on your eardrum. (It's probably unwise to do this too many times!)
But when you take the earphone out and listen to it at a distance, there is no bass audible at all. That's because all the low frequencies that are present at the cone of the earphone dissipate easily into the surrounding air. If you want low frequencies (slow-moving air) to travel through the air in any significant way, you need a bigger cone to move a larger volume of air. Have you seen a 15-inch subwoofer? And they need a cabinet as well, to stop the slow-moving air from following a local path from the front of the cone to behind the cone, so it's forced to travel outwards from the speaker.
The answers so far have assumed that the speaker is connected directly to the output of the 555. You didn't say so, but I expect your speaker is connected through a capacitor. (It should be, otherwise you might damage the 555 and/or the speaker because significant current will flow when the output is high.)
When you have an output connected through a capacitor to a load, you get an effect that's loosely called "differentiation" (not EXACTLY the same as differentiation used in calculus) or described as a high-pass filter (i.e. a filter that attenuates low frequencies, like the natural filtering effect of the air). These circuits are normally explained with a resistor as the load, but a speaker will have a similar enough behaviour.
The effect of the capacitor in series with the signal is to couple the fastest-moving voltage changes more strongly than slow-moving changes. This is because the capacitor can respond to slow-moving changes by charging and discharging, effectively "following" the more gradual changes, and leaving less of that slow-moving voltage on the other side. This is the same as the behaviour of air - slow-moving changes are "absorbed" by the air but fast-moving changes are coupled through it.
When the 555 output changes from 0V to VCC, the capacitor-resistor differentiator / high-pass filter sees a positive "step change" at its input, i.e. its input changes instantaneously from one voltage to another like a step. What you get at the output (i.e. across the speaker) is a sharp rising voltage followed by a tail-off where the voltage settles back to zero again. It looks a bit like a sawtooth but the falling part that is normally straight is concave instead. It represents the charging of the capacitor. This would be easier with diagrams; if you're interested you might want to google some of the keywords I've used here.
Once that pulse has finished, the capacitor has charged up to the supply voltage, and when the 555 output returns low, the charge on the capacitor initially pulls the speaker below ground, i.e. negative. As the capacitor discharges, the voltage at the speaker returns to zero. The waveshape is like an upside-down sawtooth with the same concave shape as before.
If you have access to an oscilloscope, try making your 555 oscillate at, say, 20 Hz and look at the waveform across the speaker. You will see a continuous 0V line interrupted by short pulses going alternately positive and negative with the distinctive shape - a sharp initial edge then a concave decay back to zero.
When the coupling capacitor is present, the speaker cone just jumps briefly outwards or inwards, and quickly returns to the rest position. This sounds similar to the click produced by a step change in the cone position, but generally less powerful, since the action of the capacitor removes the low-frequency content. It may sound more like a "tick" than a "click".