This is a basic overview of SIP in a VoIP context, so it may be too generalized for VoIP experts. You have been warned.
What is SIP?
SIP stands for Session Initiation Protocol, but what does that mean?
SIP is used to establish a connection between two or more phones, or endpoints. This protocol is primarily used with voice calls, but it can be used to set up multimedia meetings - such as video conferences, and can run over both IPv4 and IPv6.
How SIP Works
To show how SIP works, let's back up and talk about VoIP networking. When you place a call, you are not immediately connected to the other person's phone, or endpoint. A SIP request is sent first to establish a connection to the other endpoint. A different protocol transmits your conversation over the wire.
When I hit send on my phone, there is a delay before the phone starts ringing. That delay is SIP establishing a connection to the other endpoint. This is similar to the way HTTP works. If you go to a website that you've never visited before, there's a slight delay before it starts loading. That's because the browser is trying to establish a connection with the server. If you've never seen that pause, try going to a website that is outside of your country or across an ocean.
Like HTTP, SIP sends a request, waits for a response, and then makes a decision based on the response. For example, SIP could say, "This endpoint is available; let the call ring through," or SIP could say, "This endpoint is disconnected; read the disconnect message," based on different responses it receives from the other phone. This is similar to HTTP requests and 404, 403, or 200 codes.
If you have established your call and then decide to conference in another person, you use SIP again to establish a connection to the new phone.
SIP and codecs
Because SIP establishes phone or endpoint connections, it makes sense that SIP would also determine to which quality level each endpoint is capable of performing and then choosing the appropriate level for the call.
You may have noticed this, but when you connect to another phone, the quality of your call is determined by the quality of the other phone. So if the other phone can't handle a lot of data at one time, then your data is compressed to the appropriate level and sent to the other phone. A common negotiation between two phones is agreeing on a codec to use. When transmitting data, the data is compressed/coded at one endpoint and decompressed/decoded on the other endpoint. Selecting a codec that both endpoints can use is key to actually understanding the conversations you have.
While phones can directly connect through SIP, the devices generally route traffic through a SIP proxy and the proxies take care of the SIP traffic. Say you're at a large VoIP site and you want to call a coworker via their extensions. The request is probably routed to one or more SIP proxies, assuming you're using SIP on that network. The proxy will take care of appropriately translating the extension to a SIP URI and connecting the two endpoints.
Going hand-in-hand with SIP proxies is SIP registration. Similar to a DNS server, the SIP registrar acts as a location service for devices using SIP. Phones can register their URI to speed up the amount of time it takes to establish a connection between two phones. The registration server is frequently co-located with SIP proxies.
As with most things, especially networking protocols, there is more to SIP than this, but that is for another post.