Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

On the Smarkets betting exchange Biden’s chances edge to record levels – politicalbetting.com

123578

Comments

  • tlg86tlg86 Posts: 26,176
    Oh dear, the connection to Sunak’s speech has dropped off.
  • MalmesburyMalmesbury Posts: 50,366
    dixiedean said:

    So the spreadsheet they were using for the results reached its maximum size and simply excluded all the results that followed.

    Epic fail.

    WTAF? Who works with large datasets and doesn't know that Excel has a maximum file size? I despair.
    Who works with large datasets in Excel?!?
    HMG.
    It's not a particularly large dataset. Not by modern standards.

    Weaning organisations off Excel is hard. The whole planet in infested with the stuff.
  • NigelbNigelb Posts: 71,222

    Scott_xP said:

    The Donald is awake and Tweeting.

    Back to the Whitehouse today?

    "IF YOU WANT A MASSIVE TAX INCREASE, THE BIGGEST IN THE HISTORY OF OUR COUNTRY (AND ONE THAT WILL SHUT OUR ECONOMY AND JOBS DOWN), VOTE DEMOCRAT!!!"

    It's not him.

    It uses capital letters, and it says stupid shit, but there's more than that to a Trump tweet, there's a certain characteristic je ne sais quoi.

    You'd think it would be easy to copy but it's not, it's like trying to imitate Picasso or Ave It.
    Or SeanT...
  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
  • CarlottaVanceCarlottaVance Posts: 60,216

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
  • MalmesburyMalmesbury Posts: 50,366

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
  • eekeek Posts: 28,400

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    As a minimum it most be 20 miles and there is an entirely separate subset of Sanddancers in between.
  • MexicanpeteMexicanpete Posts: 28,381
    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Wow! Those in-house Merc. autoboxes were bulletproof.
  • MaxPBMaxPB Posts: 38,868

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
  • Barnesian said:

    Barnesian said:

    Barnesian said:



    There is a difference between the candidates chosen and printed on the ballot papers for the election on 3rd November and those for the election on 9th December.

    The candidates for the Electoral College election may be different and it is the Electoral College election on 9th December that will determine the presidency.

    Anxious to avoid chaos in the electoral college just months before the November vote, the Supreme Court ruled Monday that electors who formally select the president can be required by the state they represent to cast their ballot for the candidate who won their state’s popular vote.
    The justices unanimously rejected the claim that electors have a right under the Constitution to defy their states and vote for the candidate of their choice.

    “Electors are not free agents,” Justice Elena Kagan said for the court in Chiafalo vs. Washington. “They are to vote for the candidate whom the state’s voters have chosen.” Article II of the Constitution and the 12th Amendment “give states broad power over electors, and give electors themselves no rights,” she said.

    https://www.latimes.com/politics/story/2020-07-06/supreme-court-electoral-college-states-voters
    "The justices unanimously rejected the claim that electors have a right under the Constitution to defy their states and vote for the candidate of their choice.""Article II of the Constitution and the 12th Amendment “give states broad power over electors, and give electors themselves no rights"


    So the RNC instruct Republican Electors to cast their ballot on 9th December for Pence. What State is going to use that judgement to insist that that State's Electors vote instead for Trump? Not going to happen.
    States with a Dem Governor and State legislature who can award their state’s electoral college votes however they see fit.
    But they won't - will they. Be serious now.
    Just imagine the GOP have engaged in some voter suppression and shenanigans but the Dems win the popular vote by 5 million but lose the electoral college because of the suppression and shenanigans do you think the Dems will go along with that quietly when they can correct it?
  • Nigelb said:

    Scott_xP said:

    The Donald is awake and Tweeting.

    Back to the Whitehouse today?

    "IF YOU WANT A MASSIVE TAX INCREASE, THE BIGGEST IN THE HISTORY OF OUR COUNTRY (AND ONE THAT WILL SHUT OUR ECONOMY AND JOBS DOWN), VOTE DEMOCRAT!!!"

    It's not him.

    It uses capital letters, and it says stupid shit, but there's more than that to a Trump tweet, there's a certain characteristic je ne sais quoi.

    You'd think it would be easy to copy but it's not, it's like trying to imitate Picasso or Ave It.
    Or SeanT...
    Yes, he's the Jackson Pollock of the tweeting world. You'd think anyone could do it but try it yourself. Never looks the same.
  • OldKingColeOldKingCole Posts: 33,463

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    Oh of course I go Geordie. I just have visited for various talks and networking events. The facilities are nice and the staff are lovely. Just a shame it's in Sunderland really.
    Wife and I enjoyed our time there. However, 30 years later one of our nieces went and hated the place. Hated it so much she abandoned her studies.
  • FloaterFloater Posts: 14,207
    OnboardG1 said:

    Not the world's most surprising news, but thought it was worth a mention since this is the IoD's research. The idea that we're all going to jolly back to the office was always for the birds.

    https://www.bbc.co.uk/news/business-54413214

    Looks like neither I nor my team will be back in office before April next year now.
  • eekeek Posts: 28,400
    Ironically I'm working on something that is almost a track and trace system today and it does use Excel.

    But only for those clients who can't import the data directly from their core systems...
  • Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
  • @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?
  • RobDRobD Posts: 59,935
  • OldKingColeOldKingCole Posts: 33,463
    Barnesian said:

    Barnesian said:

    Scott_xP said:
    In 1981 I created a program for the ZX81 called "Cashcast" which was a spreadsheet designed for budgetting. This was before spreadsheets were available. I put it on sale for £2.99 and sold about 100 copies. I should have persevered.
    Thanks to Google I've found an ad for Cashcast. It was £4.95 not £2.99.


    Ah, the ZX81. One of my staff created a patient information to run on that, hooked up to a second hand TV. Cause of great admiration from patients.
  • edmundintokyoedmundintokyo Posts: 17,708
    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    Anywhere where the technology is driven by its users rather than by tech people with the power to say, "you tell us the requirements but we're going to solve the problem with our design not yours", it's going to have Excel sprouting all over the place. It's the tool of choice for non-technical people who get shit done.
  • GallowgateGallowgate Posts: 19,468
    edited October 2020

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Nope. My area code is 01661. Represent.
  • HYUFDHYUFD Posts: 123,137
    Good solid speech by Rishi, supporting businesses while keeping the finances strong.

    Though inevitably like the last few years of Blair and Brown his speech will be very much compared to Boris' tomorrow
  • PulpstarPulpstar Posts: 78,205
    https://twitter.com/realDonaldTrump/status/1313073591974592512

    Judging by the party split (Democrat), primary participation (Democrat leaning amongst indies) and age of the North Carolina early vote (Oldish) it looks like people have done just that.
  • A top-flight Italian football match - Juventus v Napoli - descended into chaos on Sunday when Napoli failed to turn up in Turin because of coronavirus.

    After two team members tested positive this week, Napoli say they were ordered not to travel by their local health authority in Naples, the ASL. However, Italy's Serie A football league refused to call the game off. Napoli now face an automatic 3-0 defeat.

    --------
  • FlatlanderFlatlander Posts: 4,681

    So the spreadsheet they were using for the results reached its maximum size and simply excluded all the results that followed.

    Epic fail.

    WTAF? Who works with large datasets and doesn't know that Excel has a maximum file size? I despair.
    Who works with large datasets in Excel?!?
    A colleague has encountered this in the past. It took a long time to notice.

    His problem was that he was being sent a CSV file by a third party and didn't really pay any attention to how it was being generated. If you haven't written something to check the row counts daily, you might not spot the issue, especially if it is only updating existing data and all that happens is that some fields become stale.

    It isn't a surprise that something like this would happen given it is all being done a bit on the hoof.
  • DavidLDavidL Posts: 53,859
    MaxPB said:

    Nigelb said:

    Perhaps we are world class, after all.
    (or the French are just as rubbish)
    https://twitter.com/_b_meyer/status/1313001720860094465

    France, Spain and the UK are all about as bad as each other.
    As you pointed out earlier though, our statisticians are rankly incompetent and our data collection is embarrassing but there is no evidence of us trying to actually hide anything. Every piece of stupidity is out there for everyone to see and assess. I am not so confident that is the case with the other 2.
  • NigelbNigelb Posts: 71,222
    kinabalu said:

    @Nigelb

    But where would you sell Biden EC supremacy?

    100 or higher than that?

    To be honest, I don't have a view on that.
    Too many seesaw states.
  • BarnesianBarnesian Posts: 8,604

    Barnesian said:

    Barnesian said:

    Barnesian said:



    There is a difference between the candidates chosen and printed on the ballot papers for the election on 3rd November and those for the election on 9th December.

    The candidates for the Electoral College election may be different and it is the Electoral College election on 9th December that will determine the presidency.

    Anxious to avoid chaos in the electoral college just months before the November vote, the Supreme Court ruled Monday that electors who formally select the president can be required by the state they represent to cast their ballot for the candidate who won their state’s popular vote.
    The justices unanimously rejected the claim that electors have a right under the Constitution to defy their states and vote for the candidate of their choice.

    “Electors are not free agents,” Justice Elena Kagan said for the court in Chiafalo vs. Washington. “They are to vote for the candidate whom the state’s voters have chosen.” Article II of the Constitution and the 12th Amendment “give states broad power over electors, and give electors themselves no rights,” she said.

    https://www.latimes.com/politics/story/2020-07-06/supreme-court-electoral-college-states-voters
    "The justices unanimously rejected the claim that electors have a right under the Constitution to defy their states and vote for the candidate of their choice.""Article II of the Constitution and the 12th Amendment “give states broad power over electors, and give electors themselves no rights"


    So the RNC instruct Republican Electors to cast their ballot on 9th December for Pence. What State is going to use that judgement to insist that that State's Electors vote instead for Trump? Not going to happen.
    States with a Dem Governor and State legislature who can award their state’s electoral college votes however they see fit.
    But they won't - will they. Be serious now.
    Just imagine the GOP have engaged in some voter suppression and shenanigans but the Dems win the popular vote by 5 million but lose the electoral college because of the suppression and shenanigans do you think the Dems will go along with that quietly when they can correct it?
    Ah. You are suggesting that States with a Dem Governor and State legislature who can award their state’s electoral college votes however they see fit will award their Electoral College votes to Biden rather than to Trump/Pence.

    OK. How many States and how many ECs are likely to go Republican but have a Dem Governor and State legislature who can switch the votes? Enough to make a difference?

    This is a totally different point from the BF rules, but entertaining nevertheless.
  • MalmesburyMalmesbury Posts: 50,366
    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    The shit I've seen. When this is over, at the next PB beers I'll do a standup routine on "IT systems - the bad, the insane and the stuff with the wrong number of dimensions"

    I know way too much about the Java Apache POI library. Awesome though it is - for *generating* spreadsheets.

    Python is a great scripting language. But for serious computing...
  • Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Nope. My area code is 01661. Represent.
    So if I wanted to call Sunderland council and Newcastle council what are the first four digits?
  • FrancisUrquhartFrancisUrquhart Posts: 82,103
    edited October 2020

    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    The shit I've seen. When this is over, at the next PB beers I'll do a standup routine on "IT systems - the bad, the insane and the stuff with the wrong number of dimensions"

    I know way too much about the Java Apache POI library. Awesome though it is - for *generating* spreadsheets.

    Python is a great scripting language. But for serious computing...
    Ban hammer incoming in 5...4...3...2...1....
  • MalmesburyMalmesbury Posts: 50,366
    edited October 2020

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    Quite - unless some fool simply suppressed the error completely. So attempt to upload XLS (or XSLX) and get... nothing.
  • MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
  • GallowgateGallowgate Posts: 19,468

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Nope. My area code is 01661. Represent.
    So if I wanted to call Sunderland council and Newcastle council what are the first four digits?
    Why would anyone want to call Sunderland Council?
  • OldKingColeOldKingCole Posts: 33,463

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Go there and say that.

    Writes a Essex boy!
  • Andy_JSAndy_JS Posts: 32,594
    Scott_xP said:
    Paradoxically the error might have been spotted earlier if older software/hardware had been used.
  • MaxPBMaxPB Posts: 38,868

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    Ideally you would throw up an error on clicking upload that it's not a valid file format, but I've seen systems where it doesn't do anything and gives the end user no feedback on success or failure. As I said, this seems much more plausible than a file size limitation. Excel can store a million rows and literally no one uses columns for anything other than headers it's just about the stupidest idea I've heard.

    If your script has been written in a rush and uses csv.reader to parse standardised CSV files it will work pretty reliably, especially in a closed system where everyone has been trained to use the system properly. It's unsurprising that this started to become an issue when third party access was granted to universities, the training probably wasn't very good and the instructions were probably ignored. I've only seen it happen about a million times.
  • londonpubmanlondonpubman Posts: 3,639
    Good to see Rishi keeping it positive, keeping it direct, keeping it simple!
  • MaxPB said:

    RobD said:

    So the spreadsheet they were using for the results reached its maximum size and simply excluded all the results that followed.

    Epic fail.

    WTAF? Who works with large datasets and doesn't know that Excel has a maximum file size? I despair.
    The email size limit being the issue seems more believable; the maximum excel size/length is surely enormous?

    Edit: Ah, if the entire list of cases is being stored in Excel I could see that being an issue! Weird that they were still able to add some cases but not all on the affected days though.
    I bet the issue is that the upload system only handles CSVs (a python limitation) and Excel files were being put into it. I've seen that happen loads of times because the people making the system intrinsically understand that python works properly with CSVs but doesn't with Excel files but the people doing the uploads don't know the difference.
    The format of the file sounds insane - columns for dynamic data is bad.
    World beating. Betcha nobody else in the world was doing it our way
  • FlatlanderFlatlander Posts: 4,681

    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    The shit I've seen. When this is over, at the next PB beers I'll do a standup routine on "IT systems - the bad, the insane and the stuff with the wrong number of dimensions"

    I know way too much about the Java Apache POI library. Awesome though it is - for *generating* spreadsheets.

    Python is a great scripting language. But for serious computing...
    What do you mean by serious? Launching a nuclear strike or some data wrangling?

    For the latter I don't think it really matters that much what language you use. What matters is how well it is written and managed, and how well the writer understands what they are doing.
  • eekeek Posts: 28,400
    Andy_JS said:

    Scott_xP said:
    Paradoxically the error might have been spotted earlier if older software/hardware had been used.
    The error would have been spotted immediately if anyone had revealed that the system was using Excel.
  • Andy_JSAndy_JS Posts: 32,594
    edited October 2020
    The fact that there wasn't sort sort of checking system in place to make sure that new entries were indeed being recorded as they were added to the database is mindboggling.
  • Hold on, they used COLUMNS for each individual case?
  • FeersumEnjineeyaFeersumEnjineeya Posts: 4,429
    edited October 2020

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    The XLSX format is zipped XML, so yes, completely different to CSV. Any system that required data in CSV format would surely fail with an error message if you even tried to upload an XLSX file. At the very least, you'd write the client so that only files with a .csv extension could be accepted.
  • Couldn't they have at least done a pivot, it's like a one line operation
  • eekeek Posts: 28,400
    edited October 2020

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Nope. My area code is 01661. Represent.
    So if I wanted to call Sunderland council and Newcastle council what are the first four digits?
    0191 - but then again I'm not posh and living somewhere around Ponteland... Equally I think 0191 gives you Durham County Council and the preferred university for posh Oxford and Cambridge rejects.
  • Dura_AceDura_Ace Posts: 13,677

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Wow! Those in-house Merc. autoboxes were bulletproof.
    High rpm with relatively light load is an excellent way to destroy an auto as the output shaft speed can drop to zero in an instant while the input shaft is still raging..
  • @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    The XLSX format is zipped XML, so yes, completely different to CSV. Any system that required data in CSV format would surely fail with an error message if you even tried to upload an XLSX file. At the very least, you'd write the client so that only files with a .csv extension could be accepted.
    I worked with CSVs and XLSX files in my last role and we did exactly this. I was using PHP (not Python albeit) but when this did happen right at the early stages it crashed on the first line.

    What we used to do was check the header row first, I imagine this is probably quite a typical thing to do. Anything unexpected would throw an exception.

    But I suppose this is beyond them, considering they were adding each new case as a new COLUMN
  • eekeek Posts: 28,400

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    XLSX isn't a binary format - compressed (and truly hideous) XML would be a better statement.
  • I think the old age of government and IT = disaster.....
  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
    It probably does do server side verification but which end user is going to stick around to see the result of that? I just don't think they've made a robust system made to handle every scenario you can throw at it.

    There's a huge difference between building a system that works in a perfect environment and one that works everywhere. The former is probably what has been cooked up in a short space of time.

    I'd honestly love to see the actual scripts they're using to put data into the database because they're probably extremely basic and probably don't have a fallback in case someone uploads an XSLX file.
  • And still 'Green' types will be unhappy, because it won't be socialism.
  • MalmesburyMalmesbury Posts: 50,366
    This isn't track and trace. This is PHE, which was collecting the data to feed to track and trace.

    But apart from that....
  • @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    The XLSX format is zipped XML, so yes, completely different to CSV. Any system that required data in CSV format would surely fail with an error message if you even tried to upload an XLSX file. At the very least, you'd write the client so that only files with a .csv extension could be accepted.
    I worked with CSVs and XLSX files in my last role and we did exactly this. I was using PHP (not Python albeit) but when this did happen right at the early stages it crashed on the first line.

    What we used to do was check the header row first, I imagine this is probably quite a typical thing to do. Anything unexpected would throw an exception.

    But I suppose this is beyond them, considering they were adding each new case as a new COLUMN
    The thought that you'd use columns for cases just seems too bizarre to be true.
  • CorrectHorseBatteryCorrectHorseBattery Posts: 21,436
    edited October 2020
    MaxPB said:

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    Ideally you would throw up an error on clicking upload that it's not a valid file format, but I've seen systems where it doesn't do anything and gives the end user no feedback on success or failure. As I said, this seems much more plausible than a file size limitation. Excel can store a million rows and literally no one uses columns for anything other than headers it's just about the stupidest idea I've heard.

    If your script has been written in a rush and uses csv.reader to parse standardised CSV files it will work pretty reliably, especially in a closed system where everyone has been trained to use the system properly. It's unsurprising that this started to become an issue when third party access was granted to universities, the training probably wasn't very good and the instructions were probably ignored. I've only seen it happen about a million times.
    I take the point on limiting the filesize limitation but surely any sensible dev/eng would have implemented a failsafe on the backend, which they would report to a log. You'd pick it up quite quickly I would have thought, you would be looping through and instantly through an exception because it would be nonsense compared to what the interpreter was expecting.

    I would have thought the library would do this for you, in fact
  • kinabalukinabalu Posts: 42,226
    They really have lost the dressing room.
  • rottenboroughrottenborough Posts: 62,766
    Andy_JS said:
    Want to take a wild guess?
  • You're quite right, I was thinking of XLSB, sorry for foot in mouth but the rest of what I was getting at still fits
  • isamisam Posts: 41,118
    Thought this was a clever parody. Even the name... but no, seems real

    https://twitter.com/eleanormargolis/status/1313064245798596608?s=21
  • rottenboroughrottenborough Posts: 62,766

    And still 'Green' types will be unhappy, because it won't be socialism.
    Rolls Royce will be happier though. They make small nuclear.
  • MalmesburyMalmesbury Posts: 50,366

    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    The shit I've seen. When this is over, at the next PB beers I'll do a standup routine on "IT systems - the bad, the insane and the stuff with the wrong number of dimensions"

    I know way too much about the Java Apache POI library. Awesome though it is - for *generating* spreadsheets.

    Python is a great scripting language. But for serious computing...
    What do you mean by serious? Launching a nuclear strike or some data wrangling?

    For the latter I don't think it really matters that much what language you use. What matters is how well it is written and managed, and how well the writer understands what they are doing.
    It does matter. One of the problems with Python is the culture of the "developers"*. The number of times I have had to explain concepts of code structure and testing to Python and Javascript... code writers..... There is a lot of "but it runs, ship it" Python hackers out there.

    *Many I wouldn't class as real developers
  • MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
    It probably does do server side verification but which end user is going to stick around to see the result of that? I just don't think they've made a robust system made to handle every scenario you can throw at it.

    There's a huge difference between building a system that works in a perfect environment and one that works everywhere. The former is probably what has been cooked up in a short space of time.

    I'd honestly love to see the actual scripts they're using to put data into the database because they're probably extremely basic and probably don't have a fallback in case someone uploads an XSLX file.
    Does the library they're using not throw an exception when it starts trying to loop through a zip file rather than a CSV?
  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    Ideally you would throw up an error on clicking upload that it's not a valid file format, but I've seen systems where it doesn't do anything and gives the end user no feedback on success or failure. As I said, this seems much more plausible than a file size limitation. Excel can store a million rows and literally no one uses columns for anything other than headers it's just about the stupidest idea I've heard.

    If your script has been written in a rush and uses csv.reader to parse standardised CSV files it will work pretty reliably, especially in a closed system where everyone has been trained to use the system properly. It's unsurprising that this started to become an issue when third party access was granted to universities, the training probably wasn't very good and the instructions were probably ignored. I've only seen it happen about a million times.
    I take the point on limiting the filesize limitation but surely any sensible dev/eng would have implemented a failsafe on the backend, which they would report to a log. You'd pick it up quite quickly I would have thought, you would be looping through and instantly through an exception because it would be nonsense compared to what the interpreter was expecting.

    I would have thought the library would do this for you, in fact
    Yeah, that's probably how the error was spotted, by some junior sifting through the script success logs when a manual audit was being done.
  • OnboardG1OnboardG1 Posts: 1,589

    And still 'Green' types will be unhappy, because it won't be socialism.
    Oh on your bike with the culture war today. This is objectively good news. There's always more to be done but one of the few things I think this government has twigged is that money is to be made in decarbonisation.
  • @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    The XLSX format is zipped XML, so yes, completely different to CSV. Any system that required data in CSV format would surely fail with an error message if you even tried to upload an XLSX file. At the very least, you'd write the client so that only files with a .csv extension could be accepted.
    I worked with CSVs and XLSX files in my last role and we did exactly this. I was using PHP (not Python albeit) but when this did happen right at the early stages it crashed on the first line.

    What we used to do was check the header row first, I imagine this is probably quite a typical thing to do. Anything unexpected would throw an exception.

    But I suppose this is beyond them, considering they were adding each new case as a new COLUMN
    The thought that you'd use columns for cases just seems too bizarre to be true.
    I've seen it in businesses before. Shocks me every time I've seen it, but I've seen it too many times.
  • FlatlanderFlatlander Posts: 4,681

    @MaxPB I've a bit bemused about this XLSX vs CSV issue. An XLSX is a binary file (?), completely different to a CSV, so I get that Python might not understand and will try to interpret the file regardless but surely it must have some kind of verbose error logging?

    The XLSX format is zipped XML, so yes, completely different to CSV. Any system that required data in CSV format would surely fail with an error message if you even tried to upload an XLSX file. At the very least, you'd write the client so that only files with a .csv extension could be accepted.
    I worked with CSVs and XLSX files in my last role and we did exactly this. I was using PHP (not Python albeit) but when this did happen right at the early stages it crashed on the first line.

    What we used to do was check the header row first, I imagine this is probably quite a typical thing to do. Anything unexpected would throw an exception.

    But I suppose this is beyond them, considering they were adding each new case as a new COLUMN
    If that's the case, they must have got the DEFRA team in.

    The old Farm Environment Plan form was easily the worst use of a spreadsheet I have ever seen.


  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
    It probably does do server side verification but which end user is going to stick around to see the result of that? I just don't think they've made a robust system made to handle every scenario you can throw at it.

    There's a huge difference between building a system that works in a perfect environment and one that works everywhere. The former is probably what has been cooked up in a short space of time.

    I'd honestly love to see the actual scripts they're using to put data into the database because they're probably extremely basic and probably don't have a fallback in case someone uploads an XSLX file.
    Does the library they're using not throw an exception when it starts trying to loop through a zip file rather than a CSV?
    It would, but it depends on who is monitoring the exceptions and whether the end user is actually notified of a script failure immediately or even by email if there is a queue.
  • It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that
  • MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
    It probably does do server side verification but which end user is going to stick around to see the result of that? I just don't think they've made a robust system made to handle every scenario you can throw at it.

    There's a huge difference between building a system that works in a perfect environment and one that works everywhere. The former is probably what has been cooked up in a short space of time.

    I'd honestly love to see the actual scripts they're using to put data into the database because they're probably extremely basic and probably don't have a fallback in case someone uploads an XSLX file.
    Does the library they're using not throw an exception when it starts trying to loop through a zip file rather than a CSV?
    It would, but it depends on who is monitoring the exceptions and whether the end user is actually notified of a script failure immediately or even by email if there is a queue.
    Okay...but should the team monitoring the application also not be receiving this information
  • MaxPBMaxPB Posts: 38,868

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
  • BarnesianBarnesian Posts: 8,604
    The steroids have really kicked in!

  • MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    Do you mean the fact they were using columns or they fact it's easier to use rows rather than columns and MS makes it very clear what columns are used for
  • It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    It is also more common than you'd expect . . .
  • GallowgateGallowgate Posts: 19,468
    eek said:

    Dura_Ace said:

    Stocky said:

    DavidL said:

    Dura_Ace said:

    DavidL said:

    kinabalu said:

    DavidL said:

    Today is our 35th wedding anniversary. We are out for afternoon tea at the Old Course hotel later. The planned walk is looking a little problematic, however. Some minor roads are closed with flooding around here.

    Congrats! - Squeeze in a few holes too?
    Lord no, what a waste of time that would be. I find golf just beyond tedious. The only good things about it are the walk and the outdoors.
    The sandpit things on my local course are excellent for jumping my CRF250.
    You go to extraordinary lengths to achieve a certain level of popularity, don't you? Remarkable.
    Leave Dura-ace alone - he`s comedy gold.
    When I was 18 me and my mate did donuts on a golf course in his dad's W115 220 automatic. It open diffed and blew up the torque convertor on the second loop. His dad was livid and sent him to Sunderland Polytechnic as punishment.
    Great place Sunderland Poly. At least in it's previous incarnation! Met my wife there 61 years ago!
    They have a nice law school there. :)
    Thought you went Geordie, not Mackem?
    There’s no difference between Geordies and Mackems.
    Writes a Lancastrian.....
    They have the same area code and in the same county.

    They are synonyms for each other.
    Nope. My area code is 01661. Represent.
    So if I wanted to call Sunderland council and Newcastle council what are the first four digits?
    0191 - but then again I'm not posh and living somewhere around Ponteland... Equally I think 0191 gives you Durham County Council and the preferred university for posh Oxford and Cambridge rejects.
    Hey... I'll have you know that I live within the Newcastle upon Tyne city limits!
  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    MaxPB said:

    Excel? WTF....

    I suppose if they tried using a database it would have been MS-Access
    It beggars belief that they are storing this data in a spreadsheet! Have they not heard of databases?
    They aren't, the API runs from a Microsoft SQL database and the PowerBI visualisations run from it too.
    I presume you're talking about the API for the Coronavirus Dashboard on gov.uk, whereas the issue with Excel storage seems to be further back in the data processing chain.
    I don't think Excel is used except to generate the CSV upload files and the database is the source of truth rather than a series of Excel files. As I said the most likely source of error is people uploading XSLX files into a python script which will only work with CSVs. This has become an issue when third parties (universities) have had access so it stands to reason that the people doing the uploads didn't realise XSLX files won't work with python

    The file size limitation makes no sense at all, Excel has over a million rows available and the new case per column doesn't make sense either because none of the days had more than 16.5k cases reported. Most likely Politico have a political source who also doesn't understand how these things actually work.

    I've seen the XSLX into pyhon fuck up loads of times, it's definitely something that can happen in such a disparate system with hundreds of health trusts, testing centres and now universities all reporting in separately.
    How does that work? Surely a Python script that is expecting a CSV file as input simply wouldn't work if given an XSLX file. The formats are completely different! There would surely be some indication that the process had failed, such as the lack of an output file, for a start.
    Well it just doesn't work and it depends on what kind of error reporting he script has built in, what kind of queueing system there is and whether each upload is being properly monitored for success. As an end user (usually an admin assistant) I'm given instructions to prepare my upload in Excel with this structure and then to save it as a CSV and then upload it to the system. Loads of people are going to miss out the save as CSV step but won't stick around to see the error page (if there is one, as I said people who write the scripts like to assume that everyone understands that python doesn't work very well with XSLX files so won't use them).

    Don't get me wrong, I'm not absolving them of anything it's a stupid error and has led to two weeks of under reporting. However, I find the file size stuff difficult to believe, especially based on a source that politico have. Once again it's lack of communication from the government here that is causing the issue. If it is a file format issue then I can understand it happening, with a script written in a rush probably using csv.reader instead of openpyxl which is more difficult to handle and more prone to parse errors.
    That sound very unlikely to me. Who would build an uploading system for such important data that performed no server-side verification at all? Surely you'd also have client-side verification to ensure that, at the bare minimum, the uploaded files had the correct extension!
    It probably does do server side verification but which end user is going to stick around to see the result of that? I just don't think they've made a robust system made to handle every scenario you can throw at it.

    There's a huge difference between building a system that works in a perfect environment and one that works everywhere. The former is probably what has been cooked up in a short space of time.

    I'd honestly love to see the actual scripts they're using to put data into the database because they're probably extremely basic and probably don't have a fallback in case someone uploads an XSLX file.
    Does the library they're using not throw an exception when it starts trying to loop through a zip file rather than a CSV?
    It would, but it depends on who is monitoring the exceptions and whether the end user is actually notified of a script failure immediately or even by email if there is a queue.
    Okay...but should the team monitoring the application also not be receiving this information
    One would imagine yes, but in either scenario (filesize or incorrect formatting) it seems that they weren't.
  • It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    It is also more common than you'd expect . . .
    I don't doubt the incompetence that goes into people that use Excel for literally any task beyond drawing a pretty graph for a presentation
  • GrandioseGrandiose Posts: 2,323
    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
  • RobDRobD Posts: 59,935
    edited October 2020

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    Do you mean the fact they were using columns or they fact it's easier to use rows rather than columns and MS makes it very clear what columns are used for
    The source for them using columns and not rows? It's just so absurd :D
  • Grandiose said:

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
    Okay but even in that case, to use a spreadsheet as your source of truth compared to the/a db is nuts.
  • FlatlanderFlatlander Posts: 4,681

    MaxPB said:

    PHE using Excel for data tells you all you need to know about their expertise.

    More common than you think - the number of systems I've seen where you can upload data by loading spreadsheets.....
    That sounds so awful.
    The shit I've seen. When this is over, at the next PB beers I'll do a standup routine on "IT systems - the bad, the insane and the stuff with the wrong number of dimensions"

    I know way too much about the Java Apache POI library. Awesome though it is - for *generating* spreadsheets.

    Python is a great scripting language. But for serious computing...
    What do you mean by serious? Launching a nuclear strike or some data wrangling?

    For the latter I don't think it really matters that much what language you use. What matters is how well it is written and managed, and how well the writer understands what they are doing.
    It does matter. One of the problems with Python is the culture of the "developers"*. The number of times I have had to explain concepts of code structure and testing to Python and Javascript... code writers..... There is a lot of "but it runs, ship it" Python hackers out there.

    *Many I wouldn't class as real developers
    Surely that's just an example of the writer not understanding what they are doing?

    I agree there is a certain culture around different languages, but it doesn't have to be like that. I'm sure I could write really terrible stuff in any language.
  • contrariancontrarian Posts: 5,818

    And still 'Green' types will be unhappy, because it won't be socialism.
    Whatever you do, don;t get a smart meter.

    Energy companies want to use them to black you out when the above proposals don;t produce nearly enough for our needs.

  • MaxPBMaxPB Posts: 38,868

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    Do you mean the fact they were using columns or they fact it's easier to use rows rather than columns and MS makes it very clear what columns are used for
    Columns for anything other than headers. It's mind bogglingly stupid.
  • Not only have they made their job more difficult by using Excel, they're also creating all sorts of potential issues with data corruption and a single point of failure!
  • ChrisChris Posts: 11,751
    Andy_JS said:

    Scott_xP said:
    Paradoxically the error might have been spotted earlier if older software/hardware had been used.
    "Back up"? What is this "back up" you speak of?
  • GrandioseGrandiose Posts: 2,323

    Grandiose said:

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
    Okay but even in that case, to use a spreadsheet as your source of truth compared to the/a db is nuts.
    I meant output by the local system for transfer to the national system. Obviously that isn't the way you SHOULD do things, but not impossible to imagine.
  • GallowgateGallowgate Posts: 19,468

    Not only have they made their job more difficult by using Excel, they're also creating all sorts of potential issues with data corruption and a single point of failure!

    Don't be silly. There will be a backup file on Boris's desktop called "COVID-19 tests(1)(1) copy copy.xlsx".
  • tlg86tlg86 Posts: 26,176
    The problem with Excel is that people who don't know what they're doing can use it.
  • Grandiose said:

    Grandiose said:

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
    Okay but even in that case, to use a spreadsheet as your source of truth compared to the/a db is nuts.
    I meant output by the local system for transfer to the national system. Obviously that isn't the way you SHOULD do things, but not impossible to imagine.
    Or they could write an API
  • GrandioseGrandiose Posts: 2,323

    Not only have they made their job more difficult by using Excel, they're also creating all sorts of potential issues with data corruption and a single point of failure!

    Don't be silly. There will be a backup file on Boris's desktop called "COVID-19 tests(1)(1) copy copy.xlsx".
    you need an "(autorecovered)" in there somewhere
  • RobDRobD Posts: 59,935
    .
    Grandiose said:

    Grandiose said:

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
    Okay but even in that case, to use a spreadsheet as your source of truth compared to the/a db is nuts.
    I meant output by the local system for transfer to the national system. Obviously that isn't the way you SHOULD do things, but not impossible to imagine.
    That would make much more sense. There's no way that the entire thing is being managed from one excel workbook.
  • dixiedeandixiedean Posts: 29,411
    So is there an actual official figure for number of cases over the past few days?
    Or are we just guesstimating?
  • RobD said:

    .

    Grandiose said:

    Grandiose said:

    MaxPB said:

    It boggles the mind even more when I think about it, that they were using columns.

    It's literally easier to not use columns, why on Earth were they doing that

    I just don't see how that's true, what is it even based on?
    A Daily Mail article. Must be true.

    I think it's more likely that the columns (even if that were the case) were the result of whatever software might be outputting TO the spreadsheet and/or some issues with file formats (e.g. opening and resaving!).

    As you say it is impossible to think a human used 16,000 columns...
    Okay but even in that case, to use a spreadsheet as your source of truth compared to the/a db is nuts.
    I meant output by the local system for transfer to the national system. Obviously that isn't the way you SHOULD do things, but not impossible to imagine.
    That would make much more sense. There's no way that the entire thing is being managed from one excel workbook.
    Why would you do that though? Do these systems not have APIs that can be used?
  • AlistairAlistair Posts: 23,670

    Scott_xP said:
    An easy mistake to make. Who knew schools went back in September?
    And obviously no examples from a month earlier of test demand for under 17s rocketing when schools went back.
This discussion has been closed.