Cloudtrax reported 3 nodes down last Monday, and Nesrin emailed reporting an outage. She wrote that she tried using the doodad to powercycle it, but no go. I went by this morning and it seems very much like the Scheffler's router is unplugged again. Makes sense – someone was doing work up there on Monday and either did it on purpose or unknowingly jostled it. I didn't have time this morning to find a ladder to go up and check it out.
Went back in the evening: yeah, it was unplugged. Back up now.
The network from .76 (downstairs north) and downstream went offline at some point this week. Similar problem to last time, which didn't really make sense to me. Nesrin tried powercycling it, but it didn't do anything. I finally had time to check it out today. When I got on the ladder, the router was off. The DC power plug wasn't all the way in. It seems like the router slid down the post it's attached to, and because the power cord has little slack, it got unplugged. (Unplugged by gravity!)
I slid the router back up the post, and wedged it above a pipe. If no one touches it, it'll stay in place. And plugged in.
Nothing's changed. I connected to cd:e4, and could ping the 3 south routers, but high packet loss. I SSHed into schefflers to reboot it, and packet loss went down (though ping times were still highly variable).
Connecting to 4d:a5 (downstairs-north), I can ping the router, but nothing else. I sshed in and rebooted it; no change.
I got a ladder and plugged my laptop into the ethernet coming from the gateway router. I could ping it fine. I powercycled the downstairs-north router, and everything came back up. Argh! I don't know why it's working now. Was it a loose cable? Why did it start working when I physically unplugged and plugged, but didn't work when I rebooted from the commandline?
I dunno, but it's working now.
The problem is that everything's showing as down besides the gateway router. The gateway router is up and looks fine. It's got an Ethernet link to the downstairs-north router. I can connect by wifi to the downstairs-north router, and SSH in. I couldn't ping anything else from it. I rebooted it. No change.
I can connect to the wifi on the 0:23:69:b3:cd:e4 router. I can ping mustachios, schefflers and downstairs-south, but the ping times are very long: 1-3 seconds. I can't SSH into the mustachios router: the connection is rejected or times out. I can ssh into the schefflers router. When I ping the mustachios router from there, the pingtimes are good.
I rebooted the Schefflers router and pingtimes suddenly got better. I guess that's what I was connected to. (Is the wifi on the mustachios router down?)
When I ssh into schefflers, and ssh into mustachios from there, I can get in. I guess the mustachios router is configured to not allow ssh from wifi.
Regardless: there's an obvious connectivity issue between schefflers and downstairs-north. AND between downstairs-north and gateway.
Nesrin emailed a few days ago to say that the south part of the network was down. I checked remotely, and it seemed like the 86 router was offline. I finally had time today to go by. The router had been unplugged. It also suffered some trauma: one of the antennas is missing, connector and all. I plugged it back in and everything came back up. I got a stepladder by knocking on the door of the security office. The nice fellow there didn't know where to find one; I suggested the boiler room. There was, indeed, one there.
Nesrin emailed to say that the wifi wasn't working. I checked this morning on cloudtrax and everything looked fine. When I checked cloudtrax again this afternoon, the whole network was reporting as down. I went by, could connect to the network but couldn't get an ip address. The door to the kitchen was open, so I restarted the Linksys and Bell routers. Soon after, the 3 north routers were up. I powercycled the .86 router, but that didn't help. I had to powercycle both the downstairs south routers to get them to come back up. (To get a ladder, I talked to a security guard, who put me in touch with a maintenance guy, Rocco.)
I got an email from Nesrin today saying that she's tried everything she can, and the network's still down. Cloudtrax reports that it's been down for 8 days.
Upon arrival, I could see the wifi network, could connect and get an IP, but not get online. I plugged into the ethernet on the gateway router, but couldn't get online. I sshed into the router, and it didn't have a DHCP address from the Bell modem/router. Weirdly, the gateway router was reporting only 8 minutes uptime.
I plugged my laptop's ethernet into the Bell modem/router, and didn't get a DHCP address. I powercycled the Bell modem/router, and got a DHCP address on my laptop. I plugged the ethernet back into the router and powercycled it, and everything came back up.
I'll stick around for a bit to keep an eye on the gateway router.
The whole network had been reporting as down in Cloudtrax. I was able to connect to all the routers remotely, and I traced the problem to that the Cloudtrax URL that they were using had changed. (Previously they were connecting to checkin.open-mesh.com; it's not checkin.cloudtrax.com.) I edited /etc/crontab, then restarted cron – on each router. They're now checking in. (Though the mustachios router still isn't because it needs to be reflashed/replaced.)
Cloudtrax reported that the -whole- SLM network went down sometime on Saturday. I went by today, straight to the Market Kitchen. I rebooted the gateway router, and everything came back up. I also hard-coded the opendns addresses into the gateway router. It's back up now.
Cloudtrax reported that the bottom half of SLM went down on Monday, in that same way that it often does. Yesterday Nesrin emailed to say that she tried rebooting it using the weird switch, but that it didn't do anything. This morning I went by, and found the router unplugged – the power cable had been pulled out on the router side. My guess is that the adjacent Christmas decorations had been installed on Monday, and in the process someone accidentally unplugged the router. It's back up now.
On Tuesday (13th) afternoon, Cloudtrax reported that the schefflers router (86) went down. I went by this morning, powercycled it using the switch, and it came right back up.
I also created a simplified how-to on resetting the schefflers router, and shared it with SLM tenants, so that they can do the reset themselves in the future. http://wiki.wirelesstoronto.ca/fixslm
Cloudtrax reported a few days ago that the south three routers were down (a common occurrence). I went by this morning and restarted the Schefflers router using my hacky switch. The routers came right back up. (We still need to replace the Mustachios one, tho…)
I went by today to try to fix the problem. I went first to the mustachios router; the link light on the port connecting it to the schefflers router was on. I plugged that ethernet into my laptop, assigned a static IP address, and I couldn't ping. I went to the schefflers side, plugged the ethernet into my laptop, and I *could* ping the downstairs routers. I rebooted the schefflers router, and everything seemed to start working again. All it needed was a powercycle!? I should've tried that last time I was here (since we can now do it from the ground!) I don't understand why the router in this location is always flaking out in different ways – even having replaced the router *and* the powersupply, the router in this position is always flaky.
Cloudtrax reported a few days ago that the downstairs south router stopped checking in. I went by this morning, and my laptop was able to connect to the downstairs south router, using a static IP address. I was able to ping both the downstairs south router (.40) and the mustachios router (.10), but not the others. So the problem is the line running from the schefflers router to the mustachios router. Troubleshooting that (checking the link lights, and running a cable test) is best left for after-hours.
Cloudtrax reported that all routers went offline just before 8am on July 7th. Why won't this network stay up? I went to the kitchen today – the ethernet between the gateway router and the modem was unplugged. I plugged it back in and everything came up.
I went by today to swap in a new powersupply, and install a remote switch which we can use to reboot the schefflers router without climbing up a ladder. (Olympic Cheese's handy red-topped stepladder is now being stored in the little space that the sliding gate folds into during the day.) To reset the power to this router from the ground: on top of the heater near the ceiling on the south wall just as before the outside doors there's a phone jack, with a wire looped back into itself. Unplug the wire to cut the power to the router. Plug it back in to restore power.
After turning the schefflers router back on, the south part of the building can now get online. However, I can't ping the mustachios router. (The downstairs south router – which is beyond it – *is* online, so clearly the mustachios router isn't out entirely.)
The south half of the market is down again. To recap the problem: Once we got the network up again (on May 23), we were having the occasional problem of the router near schefflers locking up, and no longer passing traffic. Consequently, the mustachios and south-downstairs routers go offline too. A simple powercycle brings the router back up, but you've got to get on a ladder to get access to the router.
On the 30th I swapped in a new router, but the new router failed in the exact same way on June 1st. Which likely means that it's a problem with the power supply. I'm out of town 'til July 2nd, so I'm hoping someone else can swap in a new power supply.
Barely an hour after I left yesterday, the schefflers router crashed again. I went today and it was dead: when I plugged my laptop into it I got a link, but couldn't pass any traffic (with a static IP). I swapped in router 86 (the one I was going to use to replace the mustachios router).
The gateway router seems like it might still be rebooting – its current uptime is less than an hour. I disabled the wifidog gateway on it, so that it's not a hassle for users.
The new router table:
|router||ethernet MAC addr||IP addr||VPN IP|
|mustashios [old firmware]||00-13-10-30-f8-a3||192.168.1.10||-|
In the afternoon on the 25th, the schefflers, mustachios and south-downstairs routers stopped responding. Today I went by and powercycled the schefflers router (that brought it back up), removed the old autooz router, and replaced the north-downstairs router (because the one that was there was from the original install, running a pre-whiterussian firmware, and so I couldn't set up the monitoring cronjob on it).
I put the monitoring cronjob on the schefflers and south-downstairs routers. It was already on the new north-downstairs and gateway routers. Only the Mustachios router doesn't have it, and that's because it too is running vintage firmware. I have a router to swap in, but didn't bring snips (reqd to cut the ziptie that's holding the router in place), nor wire or a new ziptie to attach the new router securely. Maybe sometime this week, otherwise it'll wait 'til July.
Also, the gateway router seems to be rebooting a lot. (Generally short uptimes, and a tenant reported that she kept getting taken back to the captive portal.) I swapped in a new power supply, and so far so good (one hour later). If it keeps rebooting, the interim measure would be to disable the wifidog gateway.
At some point in November, the hotspot at SLM went down and stayed down. The Internet connection we were using was no longer reliable, and we were experiencing a lot of interference inside the market. We made a few attempts to get it going, but it was no use. Over the winter I met with the market supervisor to discuss what to do with the network (upgrade it or disable it?). We decided to do basic upgrades to get the network running again, including installing a new Internet connection. It took until last night for the DSL line to be installed and for the network to be brought back online.
The new network setup: The DSL modem/router is in the Market Gallery (2nd floor, west). Our gateway router is plugged into that. A new Ethernet cable runs from there to the 'north downstairs' router (in the basement). Existing Ethernet runs from there to the other routers. Updated router table:
|router||ethernet MAC addr||IP addr|
|schefflers (old gateway)||00-16-b6-19-93-7a(?)||192.168.1.2|
As I write this, I'm not able to ping the Schefflers, Mustachios and south downstairs routers (from the gateway router). I could less than an hour ago.
Once they're back up, I will add the rest of the routers to this CloudTrax monitor: http://www.cloudtrax.com/overview2.php?id=wt-slm
The routers checkin to CloudTrax using this line in /etc/crontabs:
|*/2 * * * * /usr/bin/wget “http://checkin.open-mesh.com/checkin-batman.php?ip=X.X.X.X&mac=Y:Y:Y:Y:Y:Y&robin=r2000&batman=0.5.6-r0&memfree=13388&ssid=wirelesstoronto&pssid=-none-&users=0&kbup=5&kbdown=8” > /dev/null|
Where X.X.X.X is the ip address (beginning with “5.”) assigned by the CloudTrax tool when you add the node to that network, and Y:Y:Y:Y:Y:Y is the router's Ethernet interface address from the table above.
Wifidog has been emailed out lots of 5-minute outage reports, so it sounds like our rebooting problem is back. I remotely turned off wifidog just so that users aren't having to constantly log back in.
Wifidog reported that the router went down on Sunday afternoon. 2 months after the last renewal – I forgot to put it in my calendar. I went by this morning to renew. Same as ever; just three things I noticed:
Wifidog reported that the router went down yesterday (Monday) morning. I checked from home this morning, and it was still down. When I arrived at the market, I could see the wirelesstoronto network (from the gateway router), but couldn't get a DHCP address, OR ping it when using a static address – but that could've been our ongoing interference problem. The router itself was now checking in to wifidog. (I didn't do anything.) The DHCP server seemed flaky, so I rebooted the router. I could connect, but it was flaky (again, maybe the interference?), so I went downstairs. I could connect to the wifi on the downstairs-north router, but couldn't ping anything, even with a static IP. The downstairs centre and south routers are working fine. I rebooted the north one (by sshing into it over Ethernet), and nothing changed. It seems like the wifi on it is turned off?
I went to the Market. The routers were powered on, thankfully. So all I needed to do was renew the onezone account, and it all came up.
Gabe's out of town, and Jon volunteered to go to the Market to investigate. He phoned Gabe from the Market. From his followup email:
I examined the routers at StL Market - as far as we could deduce, the routers had power available but wouldn't power on.
Our power bar appeared to be working power. I tried plugging the routers into an outlet on the power bar that we know is working, one at a time – no luck.
I realise now I could have tried plugging the routers individually into the wall socket (well, the one attached to the “Scheffler's” sign) which looked to be offering power to the power bar. I inferred that it was, because the Ecolab unit on the wall down below (north wall, just inside the door) seemed to be powered on and was plugged into our power bar.
The problem could be the power supplies, or the routers themselves – not sure. Gabe said he had just replaced one of the power supplies recently.
Nesrin emailed yesterday to say that the service wasn't working; maybe it expired? I renewed it. I noticed high latency (>600ms) on pings to 172.20.1.1, and upon investigation noticed that the autooz router was connecting to OZ on channel 1, which is the same channel that the WT gateway router is using. I changed the WT router to use channel 6.
I also swapped in a new power supply for the gateway router (which I'd been meaning to do for a long time). I also turned wifidog back on.
The uplink went down on Friday or Saturday. When I arrived today, everything was actually working, but the wifidog router rebooted three times in five minutes. It must be a problem with the powersupply. I don't have a spare one with me, but in the meantime I've turned off wifidog. (So that users aren't having to constantly log back in.) I'll make a point to swap in a new powersupply this week.
I swapped in a new gateway router because the previous one appeared to be powercycling every half-hour or so, forcing everyone to log in again. The new one is router # 40.
The new router is powercycling in the same way. Weird. The autooz router is plugged into the same powerbar, and it's fine. Maybe the powerbar is bad? I'll try moving the plugs around on it to see if that changes anything.
Separate problem: I'm seeing lots of 30+ second pauses in traffic. I've noticed this on the main floor for a while, but now I'm seeing it in the basement too. It comes and goes, but it makes the connection damn near unusable. Check for channel conflicts and (more likely) other RF noise.
I got an email yesterday pointing out that the wifi was down. The logs show that it was down since Saturday. Weird, 'cause it's not renewal time. I went by. It looked like it just wasn't logged in, but the onezone-login.sh script wasn't making it go. (It was getting back “Session Timed out. Error in Proceeding Further. Please close and restart your browser.”) I was in a hurry, so I logged in manually.
I forgot to renew again. I did it over the phone, but it didn't start working until I issued this command from the autooz router: curl https://phc.prontonetworks.com/cgi-bin/authlogin -A 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 2_2_1 like Mac OS X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5H11 Safari/525.20' -d 'serviceName=ProntoAuthentication&userId=wt2&password=XXXXXXXXXXXXXXXXXXXX&button=Submit' -e “http://www.onezone.ca/wifi/wifi_login.html?wispId=5338&nasId=00:1b:24:78:ab:82&newReg=Y&freePlan=N” -H “Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8” -H “Accept-Language: en-us,en;q=0.5” -H “Accept-Encoding: gzip,deflate” -H “Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7” -H “Keep-Alive: 115” -H “Connection: keep-alive” -H “Expect:” -k –trace -
Last night I got an email notification that the auto-oz router at SLM was down… obviously, because I forgot to renew the account again. I phoned onezone to ask if they can set up auto-renew, and they said they couldn't. But they could renew it for one month via phone. So I did that, and then went to the Market to make sure it got picked up. It hadn't (service was still offline, showing the built-in wifidog error page), so I: - sshed into the gateway router (I figured out what the password problem was; all's good now) - sshed into the autooz router - issued these commands:
- it came right up
On July 23 I got an email saying that the service was down again; it was displaying that dumb Rogers page. I'd been working on configuring a Linksys router in client mode, connecting to the OneZone network and sharing the connection via Ethernet. I brought it to the market to see if I could get it working. It worked. I took the Rogers modem home, but figured I'd wait before cancelling the service. You can monitor the OneZone client router here: http://dashboard.open-mesh.com/overview.php?id=wt-slm
A month later the service went down again, because the OneZone plans don't auto-renew. I renewed it, and it came right back up.
There's a problem with the gateway router (the one running wifidog) in that I can't SSH into it. I dunno how to get in; I think it just needs to be swapped out.
I got an email on June 10th (just one day after my last visit!) saying that the service was down; instead of the login screen, it was showing a Rogers page. It took me until this morning to get here. I called Rogers, and eventually reached someone who explained that the service had been disabled because they detected viruses on the network. So this afternoon I swapped the *real* SLM modem back in, and it works now. (I'd swapped it with the roach coach one a while ago.) I'm hoping that their idiotic policies around disabling service doesn't apply to business service.
I got an email last night from one of the downstairs vendors (Nesrin), saying that the wifi was down. I ssh-ed into the gateway router, and couldn't ping *any* of the downstairs routers. I went by this morning, and everything was up except the south end downstairs router… which was unplugged. I pointed it out to Nesrin, plugged it back in, and it came right back up. I dunno what caused the larger outage last night.
I (Gabe) received a couple of emails last week from tenants at the south end of the basement, saying that the network was down. I went by last week, and everything seemed ok: I connected to the router as Mustachio's, and was able to ping all the other routers. I got another email this week saying it was still down. When I came back on Apr 14, I realized that the southern-most downstairs router was pingable via Ethernet, but wireless clients weren't getting DHCP addresses (these are assigned by the gateway router). Weird. I rebooted it and messed around, and got nowhere. I took the router home with me.
I recreated a comparable network at home, and set up a new router to swap in. (I set up a gateway router, then a replacement router, which I configured as a 'bridge', with the LAN IP 192.168.1.40.) I came back to install it, and at first was experiencing the same problem – I could see the wifi signal off that router, but couldn't get a DHCP address through it. I carried my laptop up the ladder and plugged it into the Ethernet cable, to eliminate that as the problem – I ping-flooded 192.168.1.1 with very low packet loss, so it wasn't the cable. I then noticed that all the other wifi networks that my laptop could see were channel 11. I sshed into the new router via Ethernet, and changed its wifi channel to 11 (it had been 1), and suddenly it started working! Moral: Channel 1 (and maybe others) at SLM is unusable, at least sometimes!!
Also note that the mezzanine router appears to be back up (after a year or two!?), but WDS is down 'cause it and the gateway need to be reconfigured. I think we'd need physical access to the router to make this work.
|far south downstairs||00:23:69:b3:cd:e4||192.168.1.40|
We received two emails on Wednesday from people at SLM saying that the wifi had been down for a few days. Wifidog was reporting everything as up. I sshed into the gateway router, and then sshed into each of the other routers, and rebooted all 4 four routers. Users responded, reporting that that fixed it.
Wifidog reported that SLM went down on Tuesday afternoon. Michael went by on Wednesday morning. The wimax modem was acting weird (lights not flashing in the expected sequence). Gabe went by on Thursday morning with a new power supply – it fixed the problem. Gabe also installed a new powerbar.
Wifidog reported that SLM went down this morning. I went by; the gateway router wasn't sending out beacons. Turns out it was unplugged. The guy from the cheese shop just to the north said that some guys were working up there this morning, and that they'll probably be back tomorrow. So it might go down again. Seems like it was a simple mistake; the plug likely simply fell out of the extension cord.
Wifidog reported that SLM went down at about noon today. I dropped by just after 3pm, and it had come back up by itself. Maybe there were people working in the area.
I noticed, though, that the mezzanine router was up… I hadn't seen it in a while. It was loud and clear at the northwest corner upstairs, but clearly wasn't connected to the rest of the network. This is a problem, and can only be fixed by getting physical access to the router. (Since the 192.168.1.1. router has changed, the mezzanine router needs a new MAC address specified for its 'wl0_wds' in nvram.) The 192.168.1.1 router will need wl0_wds set too. And maybe some other things.
And for the record:
|far south downstairs||00-16-b6-db-11-59||192.168.1.40|
Wifidog reported that SLM went down on the 27th. I went by this morning – the extension cord that the router and modem were plugged into was half-unplugged. I shoved it back in and everything came back fine. -Gabe
Wifidog reported that SLM went down yesterday afternoon. I've been meaning to go by anyway to swap in the permanent SLM modem. The router was reporting (logread) that it couldn't find the external interface (I don't remember the exact message). I restarted the router, and it did the same thing. I restarted the modem and it started working. Then I swapped in the new modem. Everything's working fine. -Gabe
At 7am on October 16th, the borrowed wimax service expired, so the Market went offline again. This morning I (Gabe) brought my personal wimax modem to the Market to get it back online. A permanent, business wimax modem is on its way.
I (Gabe) went by SLM again this morning, to take a look at things, and call EiCat. Nothing had changed, I left a message for EiCat saying as much.
In the afternoon I got a call from Andrea, saying that it's still not working for her – she sees the wirelesstoronto wifi signal, but can't connect to anything.
I went back by, and indeed, things were screwed up. This is owing to a not-fully-clean config on the new router I swapped in (#50), as well as that router #40 (the new one installed downstairs) still had dnsmasq running. (!!!) I ran “firstboot” on router #50 to clear it out (then reinstalled all the apps), and chmod -x'ed and kill -9'ed dnsmasq on #40. Fixed now, and wifidog is once again working.
In order to make sure that the service is working again in time for the Market's opening tomorrow, I (Gabe) went down with a wimax modem and a spare router. I swapped the spare router (#50) with the original one, and plugged it into the wimax modem. I left the original one plugged into the DSL line. My rationale was, this way we'll be able to see when the DSL line comes back up.
For some reason, authentication was screwing up on the new router. I'd log into wifidog, get the portal page, and when I tried to go to a website, it brought me back to the login page. Frustrated, I turned off wifidog.
Everything was working ok as far as I could tell. I didn't try connecting from downstairs.
Wifidog reports the node went down around 1pm on Monday (17th). I (Gabe) went by yesterday to confirm that it's essentially the same problem… this time, the DSL light was on, but the PPPoE client couldn't find the server (timeout). I phoned EiCat, left a message, and they phoned me back a few hours later. I spoke to someone there this morning. The suggestions are: 1) do a hard reset of the modem, meaning turn it off and keep it off for five minutes, which causes a reset of some sort at Bell's side; 2) try a different modem; 3) try authenticating with username “test@test” and password “test”, and see what happens.
They assured me that once this problem is resolved, they will open a ticket with Bell to try to get some answers about why the line is so unstable.
The only other thing I can think to try is relocate the modem someplace closer to the jack, and double-check the filters. I love the Market, but I'm sick of going there eight times a month to troubleshoot the DSL line.
It happened again. As of 10am on Thursday August 30th, the Market is offline. I went by today, and the “ADSL” light on the modem was flashing – there's no DSL sync. I phoned EiCat and left a message describing the problem.
In retrospect, it may have been worth checking in with the folks at Scheffler's, to see if they'd made any changes to their phone cabling.
[It came back at about 8:00PM.]
IT HAPPENED AGAIN! As of 3pm yesterday the Market is offline. I went by today and it's the same thing as last time… can't find the PPPoE servers, so presumably the DSL is dead. I restarted the modem and the router a few times, plugged the modem into my laptop, etc. I phoned EiCat and the opened a ticket with Bell. This probably won't get fixed over the long weekend; we'll see.
[It came back at about 7:30PM.]
It came back with no intervention from us. Presumably Bell fixed whatever they had broken.
No end to problems at St. Lawrence Market.
On Sunday I (Gabe) (stupidly) remotely changed the DSL loginid (back to “@eicat.ca”), as per instructions from EICatalyst. The router didn't come back up. I biked down to connect to the router locally, switch the loginid back (to “@canadahighspeed.ca”), and it still wasn't working.
This morning I went in to restart the dsl modem. Didn't help. I plugged the modem into my laptop and tried to login via PPPoE that way. Still no luck – it couldn't find the PPPoE authentication server. I swapped in a new ethernet cable between my laptop and the modem, and it didn't help. I phoned EICatalyst, and they said that Bell's been having a bunch of problems lately. (30,000 DSL users had no service yesterday in Montreal.) They opened a ticket with Bell, and said to call in if I get it working on my own.
The problem now is definitely either with Bell, or with the DSL modem. I didn't have time today to pick up another DSL modem from home. So the Market is still down.
(This update is a little late.)
Gabe and Edward went to the Market in the morning on Friday June 29th, to meet with Jorge to try to figure out the source of the interference. Long story short, the problem was not caused by the air conditioner, but by four mini X10 security cameras which had just recently been installed in one of the shops. As soon as they were turned off, everything came back great.
Edward and Gabe installed the third downstairs router, and the coverage is great.
Edward and I (Gabe) went to the market today. The intent was to install the new router downstairs, and to replace all the other routers, since they seemed to be all flaking out. I prepped a whole new set of routers in advance to swap in.
Upon arrival the situation seemed basically the same as when I was last there: no wifi on the main floor. I swapped the primary router in, and nothing changed… with 2 cm between the router antenna and my laptop, I could pick up the occasional for a moment every few seconds – not enough to even associate with the network. So the problem was not with the router – it's something environmental. I brought my WiSpy (2.4GHz spectrum analyzer) – it revealed that there's an enormous amount of very consistent noise coming from someplace on the main floor of the Market. The noise was less at the very north and south ends of the building, as well as downstairs. In the brick entranceway there was also little noise, and none outside. I took the new router to a fairly “quiet” spot in the front 'lobby', plugged it in, and my laptop could see it just fine. I did the same test by the west central door on the main floor of the Market, and got nothing.
As it turns out, a giant new air conditioning system was installed about two weeks ago. Some of the tenants have complained about the (audible) noise level – this seems consistent with our discovery of a high level of RF noise.
Until this issue is resolved, wifi on the main floor will simply not work. For what it's worth, channel 11 seems to be the least noisy – other channels are literally unusable.
I switched the two downstairs routers to channel 11, and the service there works well, albeit with a shorter range than I remember.
We didn't install the third downstairs router because we couldn't find an AC plug in the area where we need to install it (and I didn't bring any extension cords), and there were people in the way doing plumbing(?) work. I suggested that we install PoE gear instead of dealing with what looks like a chaotic power situation in that area.
Edward said that he will discuss with Jorge:
I have never witnessed RF noise so powerful that it disrupts wifi completely, even at a range of 2 cm. I have essentially no experience in evaluating these situations, but if I spent a lot of time on the main floor of the Market, I would be concerned about exposure.
Yesterday wifidog was reporting SLM going up and down. We got several emails from people reporting no access, or very sporadic access. I (Gabe) went in today, and it's a mess.
I couldn't see any wifi coming off the primary or downstairs-centre routers. The downstairs-north router was working. Through it, I got a dhcp address, and could ssh into the other 2 routers. I rebooted both using this method, but when they came back up, I still couldn't see their wifi.
After a bit of fiddling, I did start seeing the wifi from the downstairs-centre router. But I didn't get a dhcp address, and when I set a static address on my laptop, I still couldn't ping anything.
I did notice incredibly high latency and packet loss on the one working wifi connection, downstairs-north. This would vary a lot, from 30% to periods of 100% packet loss, and ping times sometimes over 40 seconds. For a while the entire thing cut out – I was still connected to the wifi, but couldn't ping anything.
According to the people I talked to, the problems began sometime on Monday.
Radio interference *might* explain the latency/loss issues on the downstairs-north router, but it seems too extreme to explain it – I was sitting less than 20 feet away from the router.
The mezzanine router seems to be completely offline. This is easy to imagine, since the mezzanine was renovated recently.
I think the way to go here is to swap in an entirely new set of routers (three, anyway – we'll try to recover the mezzanine router too, if possible). If it turns out that the current ones are defective, we charge SLM for new routers. If the routers aren't defective, we're square. Hopefully we'll be able to do at least some of this this Monday.
I think this is right:
192.168.1.1 - primary - 00:14:bf:0e:57:21
192.168.1.10 - downstairs centre - 00:13:10:30:f8:a3
192.168.1.11 - downstairs north - 00:13:10:30:f8:a6
192.168.1.12? - mezzanine - 00:13:10:2d:a9:98
Edward reported that he'd lost his connection – was able to see the router but not get an outbound net link.
Michael P. could see the various routers using his PDA but not connect to any of them. A quick reboot and all was good.
One oddity is that authenticating from the primary router displayed a french version of the login screen. Jason Roks has reported the same periodically at some of the Queen W venues. Connecting later at the mezzanine was good and in English.
This was kinda brute force but got the market back online.
OneZone is littering the area with a tonne of their APs