Welcome, Guest. Please login or register.

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - sjwk

Pages: [1]
1
Discovery / arpwatch data mostly missing
« on: February 16, 2015, 07:28:21 PM »
Evening all,
I've upgraded (via a clean install and wipe of the DB) to 1.4.  Since then, nedi is mostly ignoring data in the arpwatch file.  It claims to read it:
Quote
ARPW:Reading /var/lib/arpwatch/full.dat
ARPW:21297 arpwatch entries used
Yet there are only 10 nodes (and 10 records in the nodarp table) that have IPs, all the rest just have the IP 0.0.0.0.  Have checked the MAC addresses for a number of nodes missing an IP, and they do exist in the arpwatch data file, with an IP associated.
Oddly, the only IPs that are included are on a 192.168 subnet range that we do use, but which isn't the primary real-world network address range.  Even if for some reason it was just filtering to that subnet, there are a lot more than 10 nodes in that subnet in use.  I can't see any real pattern to those 10 - a mix of physical and virtual machines, running both Windows and Linux systems.

Will have a look through the code and try to work out what it's doing, but just in case I've missed some configuration option that configures what address range to pull from the arpwatch data?  As far as I can see the only arpwatch configuration option is the path to the data file?

2
GUI / Interfaces with >100% capacity
« on: March 04, 2014, 09:54:58 AM »
Had an oddity last night.  On one switch (Cisco 2960), all interfaces simultaneously showed stupid amounts of traffic.  (highest was 6333% of capacity).  Graph of that interface showed it was supposedly running at almost 800Mb/s over the 10 minute polling period (it's a 100Mb/s interface).  Is there any way to see what went on?

As far as I can see, Nedi reports it's using 64bit counters for that switch (as with all the others), output from discovery confirms this and flags no errors.  The switch has only been up for 46 days, and  even if it was the counters wrapping, I wouldn't expect every single interface to wrap at the same time.  First thought there were high amounts of broadcast traffic although not matching the time of the excessive traffic.  But seeing similar activity on all interfaces on all switches so that's probably normal.  No similar spikes on any of my other switches, including switches connected directly to this one.

As far as I can see, it generated this spike on all interfaces that have been active since the switch started, including interfaces that don't appear to have been active at the time, but interfaces that have never had anything connected had no spike.

Anyone have any thoughts?

3
Discovery / Two Cisco switches showing no traffic
« on: January 23, 2014, 06:58:24 PM »
Weird one here.  I have two switches (one last Cisco 2900, one 2960) showing no traffic on any port in the Device-Status view.  Ignoring the 2900 as it's about to be replaced, and probably isn't set up right, all other 2960s work fine and are, on the whole, identically configured (same credentials, same SNMP).  While not all switches are running the exact same version of IOS (12.2.55), there are two others running that same version.

All other data on the misbehaving 2960 is there - it shows interfaces as active, lists the devices connected, shows CDP neighbours and VLANs etc.  But the inbound/outbound counters for all interfaces on that switch just show as zero, and consequently the active interfaces report shows all ports as inactive (as that appears to use traffic to distinguish active vs inactive). If I launch the realtime traffic graph popup for one of the ports, it shows live traffic data.  Looking at the interface stats on the switch shows incrementing counts.

Actually, the values for in/out octets, errors and discards are all zero (although there may be no errors/discards in the 10 minute poll interval), although the broadcast counter does show values.

Any suggestions where to look to see why this particular switch is not playing ball?


4
GUI / Bugs, fixes and suggestions
« on: January 20, 2014, 11:36:58 PM »
Loving the new look and new features in 1.0.9.

One minor bug - the 'Devices Full' report is incorrectly showing the contact information in the 'Total IF' column - a simple fix:  .../html/inc/librep.php:1463 (in function IntActiv), change $r[2] to $r[3]

And a suggestion (not strictly GUI):
Could the arpwatch data parsing use the timestamp field to pick the most recent entry as the current address rather than the one latest in the file?  Arpwatch will update the timestamp of an earlier (in the file) entry without moving it to the bottom of the file - as a result I have lots of nodes showing up with 'current' IP addresses that date back to when they hadn't been registered/authenticated, or had been using an auto-assigned address when disconnected before getting issued with proper IPs.

I've had a quick poke at misc::BuildArp but I'm not sufficiently caffeinated to tackle that tonight, will update if and when I find time!

5
Definition Files / 1.3.6.1.4.1.9.6.1.83.10.1 - Cisco SG300
« on: January 20, 2014, 05:32:49 PM »
In case it's helpful to anyone, I've modified the bundled definition for 1.3.6.1.4.1.9.6.1.83.10.1.def - the Cisco (-rebranded Linksys) SG300-10:

I added:
Quote
VLnams  1.3.6.1.4.1.9.6.1.101.48.70
Sysdes  1.3.6.1.4.1.9.6.1.101.53.14.1.11.1
These let it read the system description, and the list of configured VLANs
I also changed to:
Quote
Bimage  1.3.6.1.4.1.9.6.1.101.53.14.1.2.1
as the supplied value was giving me the wrong firmware - it stores two firmwares - active and fallback.  However, details of the firmware turn up multiple times (for instance under 1.3.6.1.4.1.9.6.1.101.2 and 1.3.6.1.4.1.9.6.1.101.53.14.1) but not sure how to determine which of the two firmware slots is active.  I guess there's another OID that identifies which of the two slots is set as active, but not sure how to get Nedi to read a different OID depending on which version is active.  It probably is not possible, and probably not hugely important anyway...

I also changed DisPro from LLDP to CDP as with LLDP it was giving me errors on every scan where I had two of these devices connected together, whereas using CDP appears to work fine:
Quote
LLDP sees 64d814604688,gi9 with unusable IP on gi9

Would there be any reason why using LLDP vs CDP for discovery would be preferable?

I also note that these switches advertise themselves via both LLDP and CDP with their MAC address as their device ID, and their name as the System Name, whereas all other devices I have advertise themselves to neighbours with their name as the device ID.  This confuses Nedi as it can't pair up '64d814604688' (for instance) with the actual device.

Fortunately we only have a couple of these!

6
GUI / IF graph thumbnails mostly empty in Device-Status view
« on: October 14, 2013, 01:58:58 PM »
If I enable the checkbox to show interface graphs in Device Status, I get mostly blank boxes.  'Discards' and 'status' appear to be working, and probably 'broadcasts', but little to nothing on 'traffic' and 'errors', just empty blue and orange boxes (see attached image).  If I click on the box, I do get traffic graphs so I know it is capturing and storing data, and other graphs (CPU etc) all work.  It would be nice to be able to see these graphs at a glance on the overview screen as there have been a couple of recent issues where I ended up having to go through each interface on a switch to see which interface the traffic or errors were coming from.

Is something broken with my setup?  This is on 1.0.8.

7
GUI / Temperature thresholds?
« on: September 20, 2013, 07:30:17 PM »
Is there somewhere that I can set the temperature thresholds so that Nedi doesn't flag up an overheat error every single scan?

I have a couple of 2960-8s that report at 56 and 61 degrees, which on the switch itself shows as 'green' (warning and critical thresholds are 87 and 100 degrees) but which Nedi constantly flags up as being a problem as they are over 50.  Nedi isn't reading the threshold values from the switch, and I can't find anywhere else to set them?

Ta,
Steve.

8
Installation / Nedi website oddness - can't download from Chrome?
« on: February 25, 2013, 04:35:13 PM »
Here's a strange one - I can't download either nedi-027 or nedi-1.0.7 tgz files from the Chrome web browser.  Clicking on the links does nothing, right clicking and save-as does nothing.  Have tried this on three machines and Chrome on both Windows and OSX does the same.  Downloading the nedi-1.0.7-1.patch.zip works in Chrome, as does downloading the tgz from another browser or wget.  Downloading other .tgz files from other sites works fine in Chrome.  AV isn't flagging the files as suspicious or quarantining the downloads.

Can't for the life of me see what is special about these files only from the nedi website - is anyone else seeing this?

The only thing I've noticed is in Chrome's developer console, a warning:
Resource interpreted as Document but transferred with MIME type application/x-tar: "http://www.nedi.ch/pub/nedi-027.tgz".

But I'd expect to be able to save-as regardless of what the browser thinks the filetype is so may just be a red herring?

Steve.

9
Installation / GUI errors on 1.0.6
« on: August 25, 2011, 06:11:16 PM »
I just did a clean install of 1.0.6 - removed and replaced the whole nedi directory, added changes back into the new nedi.conf and ran nedi.pl -i to drop and recreate the database, so there should be no legacy stuff kicking around.

I was getting a frequent warning in the page header about split() being deprecated.  Edited php.ini and restarted apache to remove E_DEPRECATED from the error reporting, which solved that for all pages (that I've found so far) except Monitoring->Health.

On that page, and only that page, I get the deprecated warning at the top of the page (/var/nedi/html/inc/header.php on line 18), plus at the top of each of the columns for level, class and source (/var/nedi/html/inc/libdb-msq.php on line 210).  How is that page still generating a deprecation warning despite my configuring php to hide that error level?

I also get 28 further errors at the bottom of the page (one identical error for each switch):
Warning: array_key_exists() expects parameter 2 to be array, null given in /var/nedi/html/inc/libmon.php on line 491

Any ideas?
Steve.

Edit: have just found that everything in the Reporting menu also gives the deprecated warning in the header so not just that one page, but still no idea why those pages are overriding the error reporting values set in php.ini, yet other pages that include the same header are honouring it?

This is with PHP 5.3.2 from the standard Ubuntu 10.04 LTS repository

10
Discovery / Virtual servers flip-flopping between switches
« on: May 16, 2011, 05:37:03 PM »
Hi all,

This is a minor query, but I'm seeing some of my virtual servers flip-flopping from the switch the physical ESX hosts are connected to, to another one elsewhere on the network.

I have two physical ESX hosts, each of which is connected via two 1Gbps NICs configured as a trunked 2Gbps Etherchannel link to a Cisco 2960G (call that switch1).  That switch is connected to a stacked pair of 3750s at the core (switch2a and 2b), which in turn link to another 2960G (switch3) via another Etherchannel link with one interface on each of the 3750s.

Most of the time, Nedi correctly sees the virtual machines as being on either interface Po1 or Po2 on switch1.  Occasionally (perhaps 2-3 times a day on average, but some VMs seem to go days without doing this, then 20 times in one morning) it detects as having switched to interface Po1 on switch3.  Never switch2 (or any of the other switches on the network).  On the next scan, it reverts back to the correct pickup on switch1.  It doesn't seem to follow a pattern where all VMs will get picked up in the wrong place and then all will return.  It does appear to just be random as to which ones if any will be wrong.

Prior to configuring the ESX hosts to have a trunked connection and each host just had two regular interfaces, things seemed fine, and nedi only picked up a change in the interface when VCentre moved the virtual machine from one host to another.

Still running 1.05 at the moment, (eagerly) waiting for a full release of 1.06.

Is this just something I have to live with?  Is there anything I can do to minimise it doing this?  Is it something to do with the order it scans?  Anything odd with either the Cisco or VMWare configuration that can lead to this?  It's not something I want to go to hugely disruptive measures to fix, but if it is something obvious then it does at least cut down on displaying a thousand or so interface changes when I view a node...

Cheers,
Steve.

11
Other / Filter empty VLANs?
« on: January 13, 2011, 01:21:08 PM »
Would it be possible to add a toggle to hide empty VLANs in the Device->Vlan view (i.e., where population is 0)?  I distribute VLANs via VTP which means that every VLAN shows up in every switch regardless of whether there is anything currently in that VLAN on that switch.  Or would that not work/be useful?

Steve.

12
Discovery / Nedi scan not exiting
« on: December 06, 2010, 07:54:48 PM »
Hi all,
Having some issues with Nedi in the last few weeks.  It's been running fine for many months but has recently started hanging at the end of the scan (which runs /var/nedi/nedi.pl -pob every 10 minutes from cron).  This means the server slowly runs out of RAM and (within 24 hours) locks up.

Running 1.0.5 on Ubuntu 8.04 server.  Nothing has changed recently on the server other than server updates to the best of my knowledge.

Looking at lastrun:

Quote
OUI:    15969 NIC vendor entries read
DEV:    23 devices read from nedi.devices
LINK:   0 links (WHERE type = "STAT") read from nedi.links

DP-OUI-Discovery (1.0.5) with 1 seed on Mon Dec  6 18:40:01 2010
<snip>
Took 1 minutes

Devs:   21 devices discovered
619 arpwatch entries used.

Not a significant number of devices - all Cisco, appears to complete fine but then nothing.  Have tried running the nedi command from the shell - same thing, it just does nothing after the arpwatch line.

The discrepancy between 23 and 21 devices is down to one switch being powered off, and one that Nedi has a duplicate of after a config change.  Otherwise it's picking up everything it should be.

Is there any way to diagnose what's causing it to stall?

Steve.

13
Discovery / Issues with population on some switches
« on: April 15, 2010, 01:35:12 PM »
I'm having some issues where some switches are not showing population, although the ports are up.  I've just performed a clean reinstall from 1.0 to 1.0.5, although it's not a version issue as these switches were not cooperating before either.

All switches are Cisco 2960 (edge) or 3750 (core).  Some switches are showing a population count/list, others show nothing for any port, although to the best of my knowledge they are configured identically.  Before the reinstall, the population was showing up on the relevant port of the core 3750 that fed that switch, but that might be left-over from before those switches were replaced with Cisco models a few weeks ago - this was the reason for the reinstall, to clear out all of the data and start from scratch.

Just looking now however, and it appears as though the missing population is now being detected on Po1 on a 2960G that they are not directly connected to (it's the Etherchannel link from the pair of core 3750s, and this 2960 then feeds into the firewall and the Internet).  Am I just being impatient and they will, over time, get detected where they are actually plugged in?  Or is there something else wrong, given that these switches have not detected any population in the past either...?

Looking at the MAC tables via CNA confirms what Nedi is detecting so it must be a switch issue?

Steve.

14
Discovery / Node IPs not on switch subnet not detected
« on: November 16, 2009, 04:14:14 PM »
Afternoon,
I've had a longstanding niggle with Nedi that I'm trying to resolve.

I'm not using management VLANs, but my switches are all configured on an internal 192.168 subnet.  When the scan runs, it correctly sets the node's MAC address but all nodes not on that internal subnet are left as 0.0.0.0.

If a node has an address on the same 192.168 subnet as the switches, then that address is assigned and shown, only nodes with valid external IP addresses are not.

Is there some configuration setting to tell it which is the 'real' network range as it appears to be autodetecting based on the interface used to contact the switches.

Ta,
Steve.

Pages: [1]