NeDi Community

NeDi Software Specific => Discovery => Topic started by: cornua on October 26, 2011, 02:25:40 PM

Title: slow discovery
Post by: cornua on October 26, 2011, 02:25:40 PM
hi,

I had v1.0.5 running fine for a while, and decided to upgrade to to v1.0.7. I did a parallel install in a different directory to ave both running at the same time. 2 different databases are used. Problem is that it now took almost 5 days to complete.

We have about 800 routers/switches in the network and a total of about 9000 devices if I count the APs and the IP phones.

Anyidea what would slow down the discovery that much (used to be a few hours)?

thanks,
Title: Re: slow discovery
Post by: harry on October 27, 2011, 01:41:59 AM
I thnk it should be a seperate instance on seperate machine. I guess you are running two process of nedi with tow seperate database. That can impact the discovery process.
Any one please correct me if I am wrong.

Harry.
Title: Re: slow discovery
Post by: tristanbob on October 28, 2011, 09:35:50 PM
There may be different settings on what type of devices to ignore.  For example, if Nedi tries to connect to lightweight wireless access points, it can waste 30 seconds on each AP.   This happened to us, so we had to edit nedi.conf to ignore devices with a certain text in the CDP fields.

Title: Re: slow discovery
Post by: cornua on October 31, 2011, 03:36:18 PM
thanks for the suggestions.
I took pretty much the same nedi.conf settings I had for 1.0.5 (which was working fine) for the new installation v1.0.7.

I'm now running a single discovery using the v1.0.7 and looks like it'll be taking a loooong time to complete.

We use only Cisco devices on our network, and I kept the default nosnmpdev string.
nosnmpdev       IP\s(Phone|Telephone)|^ATA|AIR-LAP11|MAP-|AP(\s|_)Controlled


One thing I did was to delete some of the old snmp community strings I still have configured on nedi.conf. I'll see how it goes, but any other suggestions are welcome.

Also, should anything be done with php and/or perl? memory increase/max memory?

Thanks,
~Alex
Title: Re: slow discovery
Post by: cornua on November 03, 2011, 07:27:39 PM
...and now, even though I do a single discovery with either version, it takes days to complete. :-|
Title: Re: slow discovery
Post by: cornua on November 09, 2011, 11:56:12 PM
anyone else have any other idea? php config, re-install?

I'm not sure what to look for at this point.
Title: Re: slow discovery
Post by: SteffenS on November 11, 2011, 12:22:58 AM
Hi cornua,

I had the same problem until 1 year ago.
Since then, I have solved this by splitting refresh-discovery to many discovery-jobs running at same time.

In NeDi 1.0.5, I've used many "nedi -AU <different-configfiles>"-jobs in crontab with differnent netfilter-definitions in this config-files for every subnets.
Since 1.0.6, I used "nedi.pl -A <filter>" instead of "-U <different-configfiles>".
Thats works great!
( 6h-refresh-frequence at 6,12,18 o'clock (faster) AND 0 o'clock (slower with neithborfinding,backup,...)

have fun

Steffen
Title: Re: slow discovery
Post by: rickli on November 19, 2011, 11:50:41 AM
If you use -v you'll see how long each device took. Anything unusual there? The networks I've tested actually got faster than 1.0.5...
Title: Re: slow discovery
Post by: cornua on November 28, 2011, 03:54:32 PM
Hi,

I ran a verbose discovery on a 6503-E, took 7min... One thing I noticed in the BridgeFwd section, takes quite a long time as it timesout for snmpwalk.

@@@@@@@@@@@@@@@@@@@@@@@
BridgeFwd (SNMP) --------------------------------------------------------------
SNMP:Connect w.x.y.z snmpstring@953 v2 Tout:11s MaxMS:1472
FWDS:Walking BridgeFwd
ERR :Fp953 No response from remote host 'w.x.y.z'
(...)
@@@@@@@@@@@@@@@@@@@@@@@


I'd also guess that a CLI discovery would be faster, but for some reason, the usename/password pass authentication, but fails the enable, any idea?

@@@@@@@@@@@@@@@@@@@@@@@
Prepare (CLI)  ----------------------------------------------------------------
TEL :z-cwuser:23 Tout:3s OS:IOS EN:(.+?)#\s?$
CLI2:Matched Username: , sending username
CLI3:Username username sent
CLI3:Matched Password:, sending password
CLI3:Password sent
CLI4:Matched switch>, enabling
CLI7:Matched Password:, sending password
ERR :

Title: Re: slow discovery
Post by: cornua on December 06, 2011, 12:40:55 PM
ok... found out that the previous snmp string was causing problem, anyone knows what are the restrictions on the snmp string used? any special characters should be avoided?

Our previous snmp string included a @ in it.

Now, even though the discovery time is back to normal, I still have a couple problems;
- I can't edit the device definition as our snmp string has a ! in it, so it won't fully populate the "Community" field of the Device definition generator
- On many devices, even if a device definition exist, half the information get pulled on many devices following discovery
- strange behavior, in some distribution switches, pulling one will poll all information while will remove some of the info from the other one (serial will disappear and IP address will change). the opposite will happen if I poll the 2nd distribution switch.

thanks,
Title: Re: slow discovery
Post by: rickli on December 06, 2011, 10:11:41 PM
Cisco uses the @ for vlan indexing sometimes, but I'm not aware of other restrictions. I'll check Defgen, though...

Can you find out more details, when discovering with -v? E.g. what exactly fails?
Title: Re: slow discovery
Post by: cornua on December 07, 2011, 03:20:30 PM
Hi,

goes fine for most, but it fails at BridgeFwd walk.
here's a snapshot where the problem starts. PM me if you need the full discovery.

(...)
BridgeFwd (SNMP) --------------------------------------------------------------
SNMP:Connect xxx.xxx.xxx.xxx ROstringwith@in_it@559 v2 Tout:18s MaxMS:1472
FWDS:Walking BridgeFwd
ERR :Fp559 No response from remote host 'xxx.xxx.xxx.xxx'
FWDS:Walking FWD Port to IF index
ERR :Fx559 No response from remote host 'xxx.xxx.xxx.xxx'
SNMP:Connect xxx.xxx.xxx.xxx ROstringwith@in_it@398 v2 Tout:18s MaxMS:1472
FWDS:Walking BridgeFwd
ERR :Fp398 No response from remote host 'xxx.xxx.xxx.xxx'
FWDS:Walking FWD Port to IF index
ERR :Fx398 No response from remote host 'xxx.xxx.xxx.xxx'
SNMP:Connect xxx.xxx.xxx.xxx ROstringwith@in_it@828 v2 Tout:18s MaxMS:1472
FWDS:Walking BridgeFwd
ERR :Fp828 No response from remote host 'xxx.xxx.xxx.xxx'
FWDS:Walking FWD Port to IF index
ERR :Fx828 No response from remote host 'xxx.xxx.xxx.xxx'
SNMP:Connect xxx.xxx.xxx.xxx ROstringwith@in_it@206 v2 Tout:18s MaxMS:1472
FWDS:Walking BridgeFwd
ERR :Fp206 No response from remote host 'xxx.xxx.xxx.xxx'
FWDS:Walking FWD Port to IF index
ERR :Fx206 No response from remote host 'xxx.xxx.xxx.xxx'
SNMP:Connect xxx.xxx.xxx.xxx ROstringwith@in_it@443 v2 Tout:18s MaxMS:1472
FWDS:Walking BridgeFwd
(...)
Title: Re: slow discovery
Post by: cornua on December 08, 2011, 04:35:29 PM
Hi,

     Maybe I should start a new post on this, but regarding the issue I have with one discovery scan a device and overwriting the 2nd one..

here's the setup;
- we have 2 distribution switches per sites.
- one uplink from the distrubution switch to every access switches (providing redundancy).
- HSRP for every vlans interfaces between distribution switches.

problem:
- polling the distribution switches individually on one specific site works fine.
- problem is that it removes the serial# and bootimage from the other switch; e.g. poll switch1 ok, poll switch2 ok but it removes bootimage and serial# info of switch1. same happen the other way around.

Not sure if related, I see sometimes "duplicate IP" message in nedi's warning messages, sometimes about the hsrp virtual IP address (on both distribution switches), sometimes about an interface/IP address (even though admin down on 1 of the switches).

I though the primary key was the device name, am I wrong? or something relies on something different?

thanks,
Title: Re: slow discovery
Post by: rickli on December 10, 2011, 05:58:12 PM
Yes, the name is the primary key. What do those names look like? Are both switches shown Devices-List or just one?

The duplicate IP won't factor in IF admin status, good point though! I'll look into adjusting the event level accordingly...

I also tried reproducing your problem with a "!" in the community, but it works fine here...
Title: Re: slow discovery
Post by: cornua on December 12, 2011, 05:31:22 PM
Hi Rickli,

Thanks again for the reply. Bellow are the answers to your questions.

Issue#1:
Yes, the name is the primary key. What do those names look like? Are both switches shown Devices-List or just one?

The duplicate IP won't factor in IF admin status, good point though! I'll look into adjusting the event level accordingly...
- they all have unique names, which are the same as their DNS name, e.g. site-dsw1 and site-dsw2.
- they'll both be shown, except the bootimage and serial# that they'll overwrite each other..


Issue#2:
I also tried reproducing your problem with a "!" in the community, but it works fine here...
- the major issue I had was with devices using a "@" in their snmp string., gets stucked and timed out at BridgeFwd portion of the discovery.
- the problem with a "!", is seen when trying to edit the device definition of a device that uses a snmp string having a "!" in it. but they get discovered.


thanks,