Sunday, April 27, 2008

Servers redundancy without DNS and NAT

I had one task - develop customer servers redundancy strategy (across globally distributed datacenters) for applications where I can't use DNS (the client software doesn't understand FQDN), and I can't use NAT (this application uses very proprietary protocol). Usually we use DNS (use for example Cisco CSS and reply to DNS with right server IP), or NAT technology - RHI (Route Health Injection) with NAT on CSS (read Cisco SRND for this solution). But not for this application.
So I think that next solution can be correct: Use RHI, but use loopback interfaces on servers with same IP addresses.
Firstly, we can create loopback interfaces on every server with same IP, let use 10.1.1.1/32. We can run server application for work with this loopback interface.
Secondly, we can set routes on gateway routers to use network interface IP on every server to access to 10.1.1.1.
Thirdly, we use Object Tracking to check server health and use it for control route 10.1.1.1/32 propagation to the core (it needs to check - how to implement it)
Finally, we will announce our 10.1.1.1/32 route from every site to the core with different metric.
In case of server failure, Object Tracking will supress route announce, and clients will change server. BTW, clients will set server address to 10.1.1.1 on their client application.
This is simple picture:

Wednesday, April 23, 2008

Cisco CSG part II

Before start, some words about CSG. There are two CSGs available - CSG1 and CSG2.
First of all, this is service board to Cat6500 or Cisco 7600 chassis.
This is traffic inspection system with billing capability.
CSG1 has 6 processors. One PPC 405 and 5 Intel IXPs. PPC is working for common tasks - radius traffic processing, work with Quota/BMA servers, collect statistics, prepare data and rules to IXP processors.
CSG2 is based on Cisco SAMI card and has 5 CPU with Cisco IOS instance on every one and Intel IXP network processor for load balancing among them (without load balancing hardware it is much closer to Cisco MWAM board). It has two IXP processors, but one doesn't enabled in current software. I think that Cisco have plans to use it for load balancing in backward direction for increase load balancing performance.
When CSG1 uses serial packet forwarding across all IXP's (slow path) or fast switching between first and last IXPs (for example, first packet in TCP session goes across slow path, the others - through fast path) and prepare and send instructions to IXPs from central CPU, CSG2 uses parallel forwarding across all CPU-s, and uses C coded software for traffic inspection on every CPU.
From technology point of view I think that CSG1 is some more interesting solution, but much harder to develop and support. The CSG2 uses modern powerful CPU-s and has more performance than CSG1.
So this is L3 routing device. Separate from the 6500/7600 supervisor. CSG1 (i don't know about CSG2) doesn't send ARPs. It only copy ARP information from the chassis (i listened from the Cisco guy), but i think, that it inspects packets and catch arp packets from it. I think so because CSG has less ARP timeout than supervisor and in common situation it can loss MAC address from the ARP. And if CSG loss arp from arp table, it can't forward traffic. And it doesn't renew arp information from the supervisor arp table. So my experience - set small arp timeout on every interface connected to the CSG. My value - 60 seconds.
How does it all work? First of all, we have to know not only IP addresses of our users, but also their identification information. The only way to know it - RADIUS accounting from BRAS, GGSN or other access device. So CSG inspect, or proxy, or work as server for radius accounting. From this information it gets user IP and their login or Caller-ID for current session.
CSG stores all this information in Known Username Table (KUT). You can see it with "show ip csg account users all det" command.
After that we can inspect packets and we know which user will pay for it :)
In next post I will write about billing/quoting system.

Tuesday, April 22, 2008

Cisco CSG part I

Do you know Cisco'c Content Services Gateway (CSG)? Cisco create strange device. I believe, that engineers create some more, than they thought. As usual, such thing has a lot of issues and problems. But together with SCE it can change our view about SP's services to end users.
First of all, how we can run this device? Unlike many other Cisco's product, charging system can't work without external parts. If you have Cisco GGSN, than this system can connect by Diameter DCCA to billing system. But in most cases you don't have all the system. For example, you have GGSN from other vendor. Or you don't use GGSN, may be you systems integrator or xDSL ISP.
I will publish several articles about my experience with CSG testing, implementation and maintaining.
First of all, let create lab. We need in RADIUS client, GTP' (gtp prime) server (or Quota/BMA server), client side (traffic generator from user side), server side (response part).

From my experience, city with size less than 1 million people from point of view of small mobile operator have approximately 400-600 RADIUS accounting events per second and about 30000-60000 active GPRS (or PPP for CDMA) sessions. Our solution must simulate this city. You can easily predict needed load for you environment from this data.
We can use some good and expensive testing tools, but I don't have such :) So my goal - use free tools for it.
Firstly, we have to describe all parts of our solution:
1. RADIUS client - use radclient from freeradius, shell scripts, and SeaGull (http://gull.sourceforge.net) traffic generator for simulate load
2. GTP' server (Quota/BMA server) - server simulator from http://ipantenna.com
3. Traffic generator client side - SeaGull traffic generator, Siege http traffic generator
4. Traffic generator server side - Web server, FTP server and any other server (which protocol we will test).
In next posts I will describe all parts of this lab and we will install and configure it all.