Here's what I think is happening with the servers.
-
@hoboadam_psn said in Here's what's happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
It does not usually save my progress. It usually loses it and I have to start over.
-
@vipersneak_psn said in Here's what's happening with the servers.:
@hoboadam_psn said in Here's what's happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
It does not usually save my progress. It usually loses it and I have to start over.
RTTS issues are separate from the DD side. Everyone I've spoken to, hasn't lost any DD progress. It shows back up once you're back in.
-
@hoboadam_psn said in Here's what's happening with the servers.:
@dtownwarrior78_mlbts said in Here's what's happening with the servers.:
Well that's an exemplary explanation from an obviously intelligent person who seems to very much know what they are talking about. Now how do you get this info to SDS to make it more than just an explanation and an actual fix? This is where the current problem lies, as they have more than proven they don't even read these forums. Thanks for the knowledge, now SDS needs to hire you on to get this thing moving!
I'm not available for hire, but more than happy to troubleshoot anything related to programming issues. I'm 48. I can code COBOL. I can't do anything in python fast, but I can debug any code with the best of them.
Hey man, can you debug the current political government too?? LMAO. All kinds of Unhandled Server Exceptions happening every minute.
For real though, this explanation seems legit and is the best information we have gotten while there really hasn't been an explanation at all from them since 4/20 launch. I really hope this info is considered or at least gives SDS some inspiration to get an official, clear, concise explanation and possible target date. If they did that, people would stop flooding the forums.
-
@hoboadam_psn said in Here's what's happening with the servers.:
@vipersneak_psn said in Here's what's happening with the servers.:
@hoboadam_psn said in Here's what's happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
It does not usually save my progress. It usually loses it and I have to start over.
RTTS issues are separate from the DD side. Everyone I've spoken to, hasn't lost any DD progress. It shows back up once you're back in.
Wanted to thank you, @hoboadam_PSN. Your explanation as to your take on the servers was pretty helpful. Judging by what you mentioned, and what I've experienced - I've had decent success tonight getting in and playing some. (It could be coincidence but....) Your right in that DD progress usually isn't lost. I've lost player stats in conquest, but most other progress has been saved before the "crash". And since spamming the buttons to try and get "back in" seems to make things worse, what has seemingly worked for me several times tonight, is -not- doing something online and coming back in a few minutes. Each time, I've been able to get back to the DD menu and start playing again. Again, could be just a coincidence, but after days of frustration - I'll take it.
-
@hoboadam_psn said in Here's what's happening with the servers.:
I've taken quite a few tries at explaining the issue in various threads, so I'm going to make one myself, and offer to SDS to help. They have my contact information.
Stress balancing is broken on the server node for the menu interfacing.
There's send and receive functions embedded everywhere in every menu that sends data after a function is performed, IE, conquest victory, moment completions etc. When you complete the task, data is sent from your console to a server and logged. For a user to see it at a splash screen post event, a call script reaches out to the data repository and shows your progress.
Whenever you are getting the unhandled server exception, it's like timing out on a ping. The game will then attempt a refresh, thus your little button icon in the bottom right of your screen.
Sometimes you get partial and get bounced back to the main screen of the menu you are in, and sometimes booted completely out to the main game menus.
Whenever you try to log back in to where you were, you are inadvertently creating additional traffic because the data from your console hasn't been reconciled with the data server. This reconciliation actually has happened, which is why you aren't losing progress 95% of the time.
Take these errors and multiply them by the number of users on at any given time, the people banging buttons trying to get back in, and data cached with nowhere to display it because you're already out of that menu. SDS dumps the returned data when you are logged back in and at another menu, but it has to record the failed attempts.
It's an exponential function problem that needs to be debugged because the call time of the functions can't possibly get returns why? Because Xbox game pass created too many users and user requests throughout the game.
Solutions?
1- Change some of the menus. Require us to only go to one spot to get anywhere. Right now for example, I can enter moments from inning programs and moment menus. How do I know this is an issue? When you access a moment from the Inning screen, you get taken all the way out afterwards. If you were in the moments menu, you get taken back out to the moment menu. It's all in the coding.
- Dedicate more nodes, bandwidth for menu navigation calls. The game plays awesome. Whenever you're actually doing anything, it's smooth as silk. All the troubles are menus. XP, parallels, etc. Data is routing, being stored and ultimately being credited properly.
3- Turn off the web site marketplace untill you eliminate the console errors. It will hurt and stink, but you need a cleaner production environment to test your fixes.
Ultimately, SDS is going to have to reboot servers everyday because these issues must be fixed in a production environment. They need to trim down the production environment in order to resolve though.
Gotta walk before you can run.
I'd bet a good bottle of 1986 Chateau Rothschild that I'm on point with my diagnosis and potential fixes. I've fixed this type of code before for Morgan Stanley and know what I'm talking about.
Thank you for your time.
It's almost as if... a software engineer... should have been able foresee this....
-
Bump because people need to read this and understand that there is more to it than a simple fix. Thx brother for posting this maybe we will see less "server is broke" type threads.
-
Can I get a tldr version for dumb people?
-
@ikasnu_psn said in Here's what's happening with the servers.:
Can I get a tldr version for dumb people?
Sure.
Trevor Story did it.
-
@ikasnu_psn said in Here's what's happening with the servers.:
Can I get a tldr version for dumb people?
My theory...
Game works fine.
Coding behind menus is in need of CPR.
Servers aren't broken.
They'll fix it.
Hopefully my details and theory help inspire somebody at SDS on the approach.
There ya go brother.
-
Here's an additional bit of info you might find interesting. I've been keeping an eye and developing a bit with their API and noticed a change that appears to be correlated with their attempts to fix server load. The listings.json file recently had a trim to the data it was sending. It previously contained completed order data that still is in available in listing.json. listings.json would previously send back completed order data for 25 cards at a time but that went away.
I actually have a backup of the file from when I was working on it. Here's an example of listings.json from 4/21. Compare that with the file sent back today.
I think you are right with some of your assumptions and this issue appears to be at the architecture level. Although I'm not so sure this is a load balancing issue as your were insinuating but rather a database/worker one. I think this was also evidenced directly at launch. Users were canceling orders and not receiving credit back for the cancelations. This isn't simply an issue of too many get requests or straight bandwidth at nodes. Database and worker load would be my bet to point the finger. Take what I say with a grain of salt. I have a fairly base level understanding of how the stack interacts without specific intimate detail at any layer. I think there is enough evidence though that I can, with some level of confidence, point there.
I don't know how you fix that issue though without simply waiting for or forcing the volume to decrease. It's not like they are going to have the ability to make the architectural changes needed on the fly in the production environment in a short period of time.
You are correct however that this isn't a "game server" issue. The game and online connection, once the connection is made and data is synchronized, works flawlessly.
-
@hoboadam_psn said in Here's what I think is happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
Everything you’ve said is awesome and on point, until this. The game isn’t always saving progress and making the data exchange. Maybe there’s underlying data interruption, but there’s a gazillion examples of games (not even online games) not saving and causing added frustration, ie conquest games, showdown, MTO.
Please don’t take this as combative, because you’ve done more than SDS has ever done, give a reasonable explanation to temper frustration. I’m just looking at those holistically and saying there’s more than just data exchange errors.
-
@hoboadam_psn said in Here's what I think is happening with the servers.:
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.It's not that there is nothing wrong with the data, I think that database reads or writes are not taking place quickly enough.... either memory resources are being slammed too hard or disk I/O. One thing I noticed today when playing Conquest is that at the end of the game if I fly through the ending screens to get to the next game I would see unhandled exceptions nearly every time. If I slowed down the end of game process, took a minute to take a look at parallels or progress for example before getting back to the conquest map, then I could keep playing without ever getting an exception. Or they could be using expensive SQL queries, which would for sure be an application issue. Just my thoughts....
-
@hoboadam_psn said in Here's what I think is happening with the servers.:
@vipersneak_psn said in Here's what's happening with the servers.:
@hoboadam_psn said in Here's what's happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
It does not usually save my progress. It usually loses it and I have to start over.
RTTS issues are separate from the DD side. Everyone I've spoken to, hasn't lost any DD progress. It shows back up once you're back in.
Well they are wrong. I only play DD and have lost items and progress.
-
@vipersneak_psn said in Here's what I think is happening with the servers.:
@hoboadam_psn said in Here's what I think is happening with the servers.:
@vipersneak_psn said in Here's what's happening with the servers.:
@hoboadam_psn said in Here's what's happening with the servers.:
@fletch62176_psn said in Here's what's happening with the servers.:
Honestly seems like it could be an issue with the backend database servers.
I'm saying no because the true answer of an unhandled server exception error in python, relates to the user (or in this case, console, ui or web) trying to access the data, the data being sent, and the user not accepting it. The user then tries to hit the data library again, and the data server says... I'm already here with your information silly.
I know it sounds stupid, but there's nothing wrong with the data. Most every case where you're getting booted, the game is saving your progress, you're not seeing it right away, but then it magically appears next time you successfully sync.
It does not usually save my progress. It usually loses it and I have to start over.
RTTS issues are separate from the DD side. Everyone I've spoken to, hasn't lost any DD progress. It shows back up once you're back in.
Well they are wrong. I only play DD and have lost items and progress.
Read through the forums. Tons of complaints about losing progress in DD. Everyone knows this. I play a conquest game and win, then get the exception error and have to play the game all over again. Now, if you ar speaking of online games against another player then I have not done that yet, but that is not called DD. If that is what you mean, then you should be calling it RS (ranked season). DD includes many types of play. I even won Showdown (part of DD) and it was lost forever after an unhandled server exception error. That is progress lost. Most of what you are saying may be true, but not this part. People continuously lose progress in DD.
-
Once again, nothing I am saying is 100% applicable to every situation. I wouldn't know because I don't have access to their code. I can only work off what I read from others, what folks in my discord describe to me, and then what I experience myself.
Few notes in recent replies.
I have played conquest almost exclusively since launch. I despise showdown, have played 16 ranked games, maybe 2 event and zero BR. I have not lost data or progress once. I'm sitting at 292k XP and only have half the USA map left.
There's DEFINITELY a pattern to the errors and they're DEFINITELY tied to navigation within the menus. I agree that within conquest, if you let the post game play out and exit to the next game efficiently and not like a spaz machine, you typically stay connected. I streamed my gameplay for 4 hours last night. If I put the controller down for 10 minutes and resume the game, I'm almost always booted post game.
Now using these events, posts and discussion tell me? It confirms that traffic for users when navigating menus is the primary culprit for the errors.
Now here's stuff that I am starting to assume, based on game experience and these errors.
There are different servers storing data and progress for different modes. Market is entirely separate and I'm not touching that today. Day 1 issue, I have a theory there too.
I cannot talk to RTTS. I think they have data, progress etc on a "baby" server. Storage requirements are low. Traffic right now is high there because of the overhaul.
Moments and Showdown are together. These modes are being played the most right now. Naturally traffic is going to be highest here.
Ranked, BR, and other H2H are together. BR suffers the most from these issues. That's where the most navigation clicking occurs and where communication is high between the console and servers.
Conquest, play vs CPU etc are together. Offline stats are being stored so this is why it's separate. Once again, I have TONS of experience in this area and I was able to play games for 3.5 out of 4 hours last night when navigating at a consistent speed.
Ultimately no matter what mode you are in and which data repository you are communicating with, we ALL end up hitting the same navigation portals eventually either by choice or unhandled server exception because of the communication within our mode, not being able to be reconciled because you go too fast, too slow or wait too long on the user end.
Everything comes back to the traffic load. Gamepass created a huge influx of "navigators". Technical test went well from a navigation perspective, but if you recall, we were down most of day 1 and 2. I'd bet this issue was masked by the bigger ones and thus overlooked.
I appreciate all of the feedback on my theory. I know I'm not 100% right. That's the beauty of brainstorming. I respect and appreciate everyone's take on this.
-
@hoboadam_psn said in Here's what I think is happening with the servers.:
Thank you for your time.
No, thank you for YOUR time. Excellent post. And I thought you were just a pretty face...
-
"I have played conquest almost exclusively since launch. I despise showdown, have played 16 ranked games, maybe 2 event and zero BR. I have not lost data or progress once. I'm sitting at 292k XP and only have half the USA map left."
That is amazing. At least half the time I get no XP or game win credit when playing conquest. I do not get why there is such a difference. If I get a server exception error I always lose credit for a win. I pray after each game that I get credit for it.
-
This post is deleted!
-
This post is deleted!
-
That is a very detailed analysis, nice work.
But you forgot to carry the 1, because what all of this adds up to is the corporate bigs of SDS being ongoing cheapskates