[Date Prev][Date Next][Date Index]

Re: BESSRC experience with Spec/Epics




On 11/19/02 4:53 PM, 'Tim Mooney' <mooney@aps.anl.gov> wrote:

> 
> I haven't heard from users directly reporting long CA timeouts with spec,
> though I have heard of problems thought to be CA related--I think BESSRC
> may have seen some of this.

Yes we have, so much so that we have largely given up using epics with spec.

> 
> I agree that applications should not be expected to fix problems with network
> hardware, although they should make as much use as they can of error returns
> and connection-management messages from CA.

I'm reasonably convinced that our problems are not caused by network
hardware.

> 
> I talked with Gerry yesterday, and have the impression that spec's doing the
> right things (I'm not a CA guru): when it does a ca_put() or ca_get() (the
> non-callback version) it calls ca_pend_io() with a user-specified timeout.
> It's ok for this timeout to be quite long, because ca_pend_io() will return
> as soon as it receives server replies to all the outstanding non-callback
> requests.  Also, spec calls ca_pend_event() frequently (with a very short
> time value, because ca_pend_event() will never return before the specified
> time
> has elapsed), so CA should be getting enough processor time to do its
> business.
> 
> My understanding is that it's possible for CA to simply not send some messages
> if it's 'send' buffer runs out of space and new messages continue to be added.

I didn't know this.  Is this on the client?   It would be consistent with
what we see.  I get the impression that the errors are more likely when spec
has a large number of epics motors defined and specifically during the burst
of activity that occurs when leaving spec's 'config' screen.

> It's also possible for CA to get insufficient CPU time to handle all the
> messages it's intended to handle.  This could mean that a request doesn't
> get sent, that a sent request doesn't get received, that an acknowledge
> doesn't
> get sent, or that a sent acknowledge doesn't get received.

How much CPU time does it want ?  I've seen the problem on a dual 1.5GHz
Athlon system, which should be adequate for talking to a 25MHz 68040 !

>As you note, there
> doesn't seem to be a way for the client always to know what has occurred.
> What
> could a client do in this case other than complain to the user or retry the
> operation (if the operation /can/ be retried)?


Guy Jennings

BESSRC CAT