Discussion:
[p4] p4 monitor terminate never seems to work
UnstoppableDrew
2014-09-17 14:00:01 UTC
Permalink
Posted on behalf of forum user 'UnstoppableDrew'.

I've been noticing that the monitor terminate command doesn't seem to
work most of the time, at least when it comes to sync commands. One of my
Jenkins jobs was hanging trying to sync, so I killed the job. However the sync
process was still going, so I used p4 monitor terminate to try and kill it. 17
hours later, it's still going. Normally I would go onto the Perforce server
itself to take out the rogue process, but at my current job while I'm a p4
super user, I don't have access to the server.



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
P4Shimada
2014-09-18 00:15:01 UTC
Permalink
Posted on behalf of forum user 'P4Shimada'.

Hi,

Sorry to hear that you have a hung sync command(s).
UnstoppableDrew
2014-09-18 13:30:01 UTC
Permalink
Posted on behalf of forum user 'UnstoppableDrew'.

For additional data points, I'm going through a local proxy, and it looks
like we're using a broker as well based on p4 info:

Server address: localhost:1667
Server root: /p4/1/root
Server date: 2014/09/18 06:18:53 -0700 PDT
Server uptime: 124:18:09
Server version: P4D/LINUX26X86_64/2013.1/685046 (2013/08/07)
Broker address: perforce-new1:1666
Broker version: P4BROKER/LINUX26X86_64/2013.1/659207
Proxy address: pforce.mycompany.com:1999
Proxy version: P4P/LINUX26X86_64/2013.1/610569 (2013/03/19)

It seems like our office is particularly prone to this problem, one of the
developers here has at least half a dozen processes that are running forever. At
least a couple are caused by starting a sync, then killing it with ^C.

Looking at the proxy's log I see a bunch of these:

Perforce proxy error:
Date 2014/09/17 23:01:15:
Connection from 10.95.7.39 broken.
TCP receive failed.
read: socket: Connection timed out

There were 4 identical entries each for 23:01:14 & 23:01:15.



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
P4Shimada
2014-09-19 20:05:01 UTC
Permalink
Posted on behalf of forum user 'P4Shimada'.

Thank you for sending the proxy error message from your log along with your
Perforce system info.

The error message means that the connection between the server and client was
unexpectedly terminated or that the client exited an unexpected
time.��This can be cause by someone killing off the client or a user
issued CTRL-C, a network link dropping, a server reboot etc.

Try using the "p4 lockstat" command to know if Perforce is running and
if the troublesome Perforce sync command is locking the database. See the
following for an example of how to use this command on a hung server:

����
http://answers.perforce.com/articles/KB_Article/Fixing-a-hung-Perforce-server

In general, you can try this and run:

����p4 lockstat -c <client>

and you see that the Perforce database is locked. To confirm if a process is
still locked, run:

����ps -elf | grep p4d

In any case, you can run:

����p4 monitor terminate <pid>

Let us know if this frees up your Perforce system again and whether this works
for you.

REFERENCES

http://answers.perforce.com/articles/KB_Article/Client-Workspace-and-Global-Metadata-Locks

http://answers.perforce.com/articles/KB_Article/Common-Questions-about-P4D-Processes-on-Unix-Systems/

http://answers.perforce.com/articles/KB_Article/Killing-Perforce-Server-Child-Processes/

http://answers.perforce.com/articles/KB_Article/Using-lsof-to-diagnose-and-fix-a-hung-server/



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
UnstoppableDrew
2014-09-19 21:00:01 UTC
Permalink
Posted on behalf of forum user 'UnstoppableDrew'.

Ok, so it looks like the clients database is the problem here:

p4 lockstat -C
Read : clients/aem_gem
Read : clients/jenkins_gem-master-96886
Read : clients/jenkins_gne_1000
Read : clients/jenkins_gne_1100
Read : clients/jenkins_xms_master
Read : clients/service.engweb_bccm-jenkins

All the stuck jobs are in one of those clients:

1233 T buildmaste 24:59:48 sync -f //jenkins_xms_master/***@636431
9316 T buildmaste 71:58:02 sync //jenkins_gem-master-96886/***@629941
15293 R buildmaste 01:59:12 sync //jenkins_gne_1000/***@636273
15525 T drew.marol 48:54:17 client -d aem_gem
6058 R buildmaste 24:19:19 sync -f //jenkins_xms_master/***@636431
22212 T service.en 71:17:37 sync //service.engweb_bccm-jenkins/***@629965
23675 R buildmaste 01:51:15 sync -f //jenkins_gne_1100/***@636286
27779 T drew.marol 50:07:47 change -i
29045 T drew.marol 71:40:59 sync c:\p4\aem_gem\...#head

Unfortunately, monitor terminate doesn't help, which is why I started this
thread in the first place. You can see most of these are already marked for
terminate, and the ones that have been running for 71 hours now were marked as
such within the first hour, some much less than that. In previous jobs, I have
used p4d -c "kill <PID>" directly on the server to handle these
things, but in the current job, I do not have direct access to the Perforce
server.



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
Tunga Mavengere.
2014-09-19 23:10:01 UTC
Permalink
Posted on behalf of forum user 'Tunga Mavengere.'.

Interested to find out the solution to this problem as well. I have seen p4
monitor terminate take hours to terminate processes as well. I have no details I
can share but I know this is a problem that occurs sometimes.



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
G Barthelemy
2014-09-23 14:20:01 UTC
Permalink
Posted on behalf of forum user 'G Barthelemy'.



[http://forums.perforce.com/index.php?app=forums&module=forums&section=findpost&pid=15288]
Unfortunately, monitor terminate doesn't help, which is why I started this thread in the first place. You can see most of these are already marked for terminate, and the ones that have been running for 71 hours now were marked as such within the first hour, some much less than that. In previous jobs, I have used p4d -c "kill <PID>" directly on the server to handle these things, but in the current job, I do not have direct access to the Perforce server.
You will find that killing the client process will let the process on the server
terminate. I have always been wary of killing hanged syncs on the server for
fear of DB corruption, although I have to admit this is more out of superstition
than rationality, as by then the sync is no longer streaming data to the client
and it is no longer writing to the database, crucially. Using p4d -c "kill
<PID> " is a great idea. I must say that without access to the
server, even tracing the client PID (e.g. using netstat -p and matching ports)
is not going to be straight forward if the client is on a busy shared host (and
the chain is longer to follow if you use proxies and brokers).

I have found that there is always a valid explanation when a process squat the
monitor table even when p4-terminated but in the case of syncs I have never gone
to the bottom of the underlying reason, for lack of time. With syncs, it is
often caused by the client issuing a INT or TERM signal without actually exiting
the client process (often in scripts or client applications): the sync stops,
the peer p4d process does no longer access the database, but the socket pair
stays open and seem to keep each other alive (tcpdump / wireshark show that both
send each other short packets at regular interval and neither will timeout).
There is a TCP-ish flavour to this issue, my gut feeling is that it's not
necessarily just at the application layer. We seem to have this issue
exclusively with remote sites connecting to the Perforce server through TCP
accelerators, but that could be just a coincidence.

Now sometimes processes hang simply because they depend on others, so for
example just a few days ago a user caused a sync to hang. Then he proceeded to
delete his problematic client from P4V, not once but 4 times probably because
the client just would not go away, my guess is because there was a client lock
on it due to the sync (lockstat -c or -C is not related to database table
access, by the way, it just reports on client locks). He eventually exited P4V.
The 4 "client -d" commands remained in the monitor table (with no tcp
peer at the client end) and the client spec was still in the database. I made
him kill his script related to the sync. As soon as he did, all the related p4d
processes exited gracefully, including the 4 "client -d" (the first
one of which actually deleted the client spec from the DB).

Another scenario where processes can't be p4-terminated: someone is in the
middle of editing a spec and their window manager die for example. Again, this
is understandable (if the editor process did not exit) and is cleaner to resolve
at the client end...

I used to be a little OCD with processes that were just hanging in the monitor
table, but now I tend to let them go away by themselves. Maybe it's an age
thing :-) Clients eventually reboot their machines, etc... As long as your
server is not busy to the point where it threatens to run out of PIDs, of
course...



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
P4Shimada
2014-09-23 18:25:01 UTC
Permalink
Posted on behalf of forum user 'P4Shimada'.

Hi Drew,

Thanks for sending your output of lockstat and showing the commands marked for
terminate. Since your server is earlier than 2013.2, you may want to disable the
server.locks.dir until going to the 2013.3+ versions. (With 2013.2 or later
server's, administrators may set server.locks.sync=0 to specify that the
sync command not take the client workspace lock at all; at the default setting
of 1, the client workspace lock is taken in shared mode as before.)

The following doc shows how to disable this:

��
http://answers.perforce.com/articles/KB_Article/Client-Workspace-and-Global-Metadata-Locks



--
Please click here to see the post in its original format:
http://forums.perforce.com/index.php?/topic/3581-p4-monitor-terminate-never-seems-to-work
Continue reading on narkive:
Loading...