[plug] Weird Debian Testing problem

Onno Benschop onno at itmaze.com.au
Sun Jun 1 17:30:14 WST 2003


On Sun, 2003-06-01 at 16:35, Ryan wrote: 
> Does logging in as a different user have any effect?  Create a 
> new user if you have to.  If it does, move the contents of your 
> home directory elsewhere, login - enjoy the stability and then 
> put everything not already there back :)

While I did log in as a different (existing) user, and I did not see any
crashes, I'm not yet convinced that not seeing any crashes is real,
given that before I typed my first message I'd had to reboot five times
in 15 minutes, since then I'm still running. (1:39 at the moment)

> On Sun, 2003-06-01 at 16:41, Craig Ringer wrote:
> OK. First: are things OK on the console? Can you "apt-get install mutt" 
> and make sure you have working email on the console - might make things 
> easier later. If things are working ok on the console, consider the 
> posssiblity of breakage in your X environment. Try a different window 
> manager, ideally something very simple like twm (*uggh*).

The console has never had any problems, I can always switch back and
forth (that is, unless I'm trying to get my Clie memory stick mounted,
but that's a different issue I feel.)

As for your suggestion about a different window manager, let me throw in
another observation - from memory, because I've not copied the string
when it actually happened - xhost gives back some weird responses:

Normally it says:
access control enabled, only authorized clients can connect
LOCAL:

But after a freeze/crash/weirdness, it returns something about cannot
connect to display ""

> Most likely this will be it - breakage in your X environment. Especially 
> if you're using something "fragile" like GNOME or KDE, where a lot of 
> different things have to play well together for it to work properly.

Yeah, now the trick is finding where :-(

> I suggest that you do a "ps aux | less" and look over the output, 
> keeping an eye out for zombie or "hard" sleeping processes. A zombie 
> will look like this:
> 
> craig     5859  0.0  0.2  7168 2236 pts/2    Z    13:53   0:00 bash
> 
> and a process in uninterruptable sleep:
> 
> craig     5859  0.0  0.2  7168 2236 pts/2    D    13:53   0:00 bash

I've not been able to run this after a crash, because it's still up, but
nothing is zombie [my machine died here]



[I'm baaack...]

Ok, when the machine died, I recalled Craig's comments about strace, but
I didn't have it installed, so, because a terminal was still running, I
could install it.

When I did, I noticed that the last thing any gnome application did was
open .esdauth, read/write some bytes, and sit there.

I recalled getting sound to work recently and turning on the sound
server.

When I killed esd, all sprang back into life.

So, thanks for both your suggestions. If anything, it made me want to
keep looking for the problem.

Now I've got a solution, I still don't really know why, but now I can
google my way out - I hope - and I'll let you know what happened.

So thanks again for giving me enough incentive to keep hunting.

(For completeness and the archive, I'll address the remainder also:)

> To check the condition of the disk, try running "smartctl -a /dev/hdx" 
> where hdx is your main HDD (repeat for all HDDs in the system). Look for 
> logged errors at the end of the output, bad sectors, high ATA error 
> counts or UDMA error counts, etc. If it reports that S.M.A.R.T is not 
> enabled, try "smartctl -e /dev/hdx" then retry the -a query. If it still 
> doesn't work - you probably have old drives or an old BIOS, and won't be 
> able to use the disk's self diagnostics.

That gave me no errors.

> Also, try doing an "strace" on a process to see what's holding it up. 
> Just run "strace programname arguments" where you'd normally run 
> "programname arguments". It can be useful to do something like
> 	strace 2>&1 xterm | tee /tmp/trace
> so you can see what's going on and log it for later processing as well. 
> It looks like gibberish, but I've found it an invaluable debugging tool 
> in determining what's going wrong with an app, and where. Strace doesn't 
> work properly on multithreaded apps like mozilla and openoffice.

Yay! This was what gave the game away!

> 'luck

With friends like Craig and Ryan, who needs it :-)

Thanks again.


Onno Benschop 

Connected via Optus B3 from S33:37'33" - E115:07'30" (Dunsborough, WA)
-- 
()/)/)()        ..ASCII for Onno.. 
|>>?            ..EBCDIC for Onno.. 
-- -. -. --   ..Morse for Onno.. 

Proudly supported by Skipper Trucks, Highway1, Concept AV, Sony Central, Dalcon
ITmaze   -   ABN: 56 178 057 063   -  ph: 04 1219 8888   -   onno at itmaze.com.au



More information about the plug mailing list