[plug] OT: Pool of IPs for testing load-balanced connections

Fri Sep 4 00:25:59 WST 2009

On Wed, Sep 2, 2009 at 8:47 PM, Adrian Woodley<Adrian at diskworld.com.au> wrote:
> Having worked at an ISP, I know allow about "High Availability" and the
> likelihood of any given IP being actually available, especially when
> accessed off-net. :P
>
> The way the load-balancing works, it sets a static route for the target IP
> via the gateway being tested. Traffic for this test IP will always be
> directed out that gateway, even when other connections are available or the
> gateway being tested is down. This means that a common/important IP can't be
> used (ie google.com's A record).
>
> Because each router connected to the LINC will be doing its own load
> balancing, with its own static routes for the test IPs, each router will
> need a unique IP to test against.
>
> For example, there are two routers, A and B. Both use 203.0.178.191 to test
> for internet connectivity via the LINC. If router A is elected the default
> gateway for the LINC, it will still route all connections for 203.0.178.191
> back out via the LINC (static route). Test pings from router B will hit
> router A and loop until the TTL is hit. In this scenario, router B will
> never decide that the LINC gateway is available.
>
> If routers A and B have unique test IPs for testing for Internet
> connectivity via the LINC, the test ping will hit the opposing router and be
> direct out its current gateway, thereby establishing the validity of the
> connection.

I find whenever I'm faced with an odd looking problem like "find
unique, highly available public IPs that aren't used by users" it's
best to take a step back and look at what problem you're actually
trying to solve. For each pair of machines (A, B) there are the
following questions:

i) Can machine A reach the internet
ii) Can machine B reach box A
iii) Can machine B reach the internet through machine A

I'd assert that if i) and ii) are true, iii) should also be true (I'm
assuming here that all machines are under your control). If it's not,
it's a config issue which should occur infrequently enough that we
don't mind dropping user traffic until it's detected. With that in
mind we only have to solve i) and ii).

ii) is trivial.
i) is also reasonably simple. You can open a data link socket and
write you own headers to force the health check out the internet link.
If you don't want to do this you could also mark the health check
packets with iptables and then use custom routing rules to send them
out the external interface. Each machine then knows its own health and
allows others to connect to it and query its health state.

Once that's in place, each machine can build a list of candidate
gateways by attempting to connect to every other and get the health
information. You'll need to tweak the health check interval and
thresholds to strike a balance between convergence and stability.

Simon N

> Adrian
>
> Tim wrote:
>>
>> Universities. ISP ftp servers. Friends static ip's. Friends servers
>> hosted somewhere?
>>
>> How does the pool work? Do you have a server on the net that can
>> maintain the pool list itself, by testing for internet connections to
>> these ip address and if they are working, then to keep them on the
>> list, if they aren't then to drop them. And when these networks have
>> internet access, they update the list off your server?
>>
>> Or better yet.... How about the root DNS servers? (Ducks from any
>> flack headed my way about extra load and miss using the root DNS
>> servers)
>>
>> Tim
>>
>> On Thu, Sep 3, 2009 at 11:01 AM, Adrian Woodley<Adrian at diskworld.com.au>
>> wrote:
>>
>>>
>>> G'day PLUGers,
>>>
>>> I've just about finished the new design for the networking of the various
>>> mobile comms facilities at work. Each comms unit should have an Internet
>>> connection, either via NextG or a satellite uplink.
>>>
>>> The new network design will allow any combination of comms facilities and
>>> up-links to join the Local Inter-Network Connection (LINC) and share
>>> their
>>> routing information.
>>>
>>> Once the facilities are networked, the device with the highest priority
>>> will
>>> be elected the default gateway for all other devices. Additionally, each
>>> facility, with its own Internet connection, will load-balance between its
>>> connection and the elected gateway.
>>>
>>> The (minor) issue I'm facing is that for the load balancing to work, the
>>> router (embedded system running pfSense) needs to be able to ping a
>>> (unique)
>>> target out on the Internet. What I need is a pool of very stable IPs to
>>> test
>>> against, say around two dozen.
>>>
>>> The targets must be IPs, rather than host-names, as until a ping is
>>> successful there is no Internet connection and therefore no DNS.
>>>
>>> Any suggestions?
>>>
>>> Adrian
>>> _______________________________________________
>>> PLUG discussion list: plug at plug.org.au
>>> http://www.plug.org.au/mailman/listinfo/plug
>>> Committee e-mail: committee at plug.linux.org.au
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://www.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au
>