Wikipedia talk:Article size

Page contents not supported in other languages.
Source: Wikipedia, the free encyclopedia.

Tables and lists

After a recent (and correct edit), a pargraph in the guideline now reads:

Readable prose is the main body of the text, excluding material such as footnotes and reference sections ("see also", "external links", bibliography, etc.), diagrams and images, tables and lists, Wikilinks and external URLs, and formatting and mark-up. The measure may substantially underestimate the amount of content in articles that summarize much of their information in tables, especially when these contain notes and explanations in text columns.

I propose that it would make more sense to remove ""tables and lists", and remove the newly added second sentence. Some articles (including some of our longest) consist almost entirely of lists (sometimes formatted as tables).  — SMcCandlish ¢ 😼  16:34, 25 August 2023 (UTC)[reply]

Your final sentence is true, but I'm having trouble seeing why the previous one follows from that. Could you explain? Nikkimaria (talk) 03:20, 26 August 2023 (UTC)[reply]
What's not clear? The "readable" article content at a long list is the list. The current wording a) creates a loophole such that list articles are not subject to length limits at all, and b) another loophole whereby an article that consists of, say, 75% a list ignores the entire list for purposes of length calculation. I doubt anyone actually agrees that's a good idea. Hell, it could be a [bad] excuse to convert prose material into inappropriate lists/tables, just to skirt the length guidelines.  — SMcCandlish ¢ 😼  05:10, 26 August 2023 (UTC)[reply]
I am inclined to concur that tables and lists should be treated like normal wikitext. Jo-Jo Eumerus (talk) 07:40, 26 August 2023 (UTC)[reply]
It is unclear to me why tables and lists should be treated by normal wikitext for the purpose of article size. Size limits are to to with readability, and tables are for data presentation. I am unaware of people who look up lists to read from beginning to end. Tables and lists are reference material, while articles are a presentation of information about a particular subject, which I judge to be completely different subjects. As an engineer, I have a steam table book written in the 50s (prior to computers and the internet) that is almost entirely tables about temperature and pressure for various fluids. I do not believe anyone would read the book from beginning to end as a subject matter description of steam temperature and pressure, one would just go to the table needed for the particular values. The point I am trying to make is that tables and lists should not be subject to readability limits, but certainly should be subject to technical limits, such as maximum character limit, or limits on how may citations can be included before the article breaks, or general reports on slow-down on download speed on limited access machines such as commonly used smartphones in nations with more limited data carriers. But putting a size limit on tables and lists based on the subjective readability limits would not be a good idea. It is not a loophole, it is a different perspective. Mburrell (talk) 21:58, 26 August 2023 (UTC)[reply]
As people have noted in the paragraphs above, though, people mostly don't read articles top to bottom, either. Jo-Jo Eumerus (talk) 17:46, 27 August 2023 (UTC)[reply]
Agree with Mburrell. SandyGeorgia (Talk) 11:40, 30 October 2023 (UTC)[reply]
I agree too. Also, at least with tables, collapsing them can put them out of sight and out of mind. Riposte97 (talk) 22:15, 30 October 2023 (UTC)[reply]
Ah, okay, I misunderstood. I don't object to the principle but we may need to deal with the fact that the added sentence is true wrt the tools often used for assessment of this issue. Nikkimaria (talk) 13:12, 26 August 2023 (UTC)[reply]
MOS uses the term
footers) to refer to the bottom matter (another term!) that we'd like to exclude, so maybe we could borrow that. Mathglot (talk) 09:23, 26 August 2023 (UTC)[reply
]
Sure, but it still shouldn't include "tables and lists" which are part of the main-body content of the article.  — SMcCandlish ¢ 😼  05:24, 26 October 2023 (UTC)[reply]
Disagree entirely that tables and lists should be added; the issue is readable prose, and tables are skimmed. SandyGeorgia (Talk) 11:39, 30 October 2023 (UTC)[reply]
Tables and lists wouldn't be considered the readable prose in prose articles, but they would be considered the readable prose, or whatever is closest to that, for list articles. Onetwothreeip (talk) 09:43, 9 November 2023 (UTC)[reply]
The problem with tables and lists is that we don't have an automated tool for counting the "readable prose" in them. The reason is that we have not determined a way of counting the text in them. We need to agree on this first. Only then can we consider size limits. Hawkeye7 (discuss) 18:22, 23 November 2023 (UTC)[reply]

On this point, I recommend Faked death as an example of the problem. It's 1,000 words if you exclude "lists". It's 4,000 words if you don't. In this instance, the latter is the correct/relevant number. WhatamIdoing (talk) 07:20, 26 December 2023 (UTC)[reply]

I think we may be running into an "I know it when I see it" problem. Correct me if my assumption here is wrong, but I think we would all wish to exclude the tables of (mostly) figures in List of municipalities in Alberta for purposes of prose calculation (because it ain't prose), but does anyone really want to exclude the table content in Wikipedia:Reliable sources/Perennial sources? The content of that table *is* prose (well, col. 5, anyway), and I want to count it for any prose calculation. (Yes, I know that's not an article, I just don't have a sample article at hand with long prose sections, and I'm too lazy to look now; please help me out by linking one.) Admittedly, that makes automated counting solutions more complex and that's unfortunate, but I don't think it would be fair to either include tables in both of those pages in the count, or to exclude tables in both. They require separate treatment, imho. Agree? Disagree? Mathglot (talk) 08:12, 26 December 2023 (UTC)[reply]

I agree that we want to count the full article content but not simple lists and data tables (e.g., undescribed lists of notable people, tables of sports scores, names of songs in an album).
I think that the problem should be solved in documentation, rather than code. That means that we say something like "You can use Wikipedia:Prosesize, but be aware that it undercounts the words in articles that have significant material formatted as lists or tables." Also, perhaps we should document the "exact numbers not important" part. When we say "10,000 words", we mean something like "9,000 to 11,000 words" – not 9,999 to 10,001 words. WhatamIdoing (talk) 17:13, 26 December 2023 (UTC)[reply]
Just adding a link to
Doctor Who (series 4)#Episodes, which has plenty of prose table content, replacing my poor example above. This link is thanks to helpful Teahouse responder User:Deltaspace42, who also points out that "pretty much all pages about episodes of TV series contain such table[s]". As I wasn't entirely sure that such tables existed in mainspace, it's very helpful to have this example, and to find out that it's representative of an entire class of articles. Mathglot (talk) 04:35, 28 December 2023 (UTC)[reply
]

Quality and the 15,000 guideline

Hi everyone, guidelines say articles over 15,000 'readable' words, "Almost certainly should be divided or trimmed." 15,000 comes from a compromise relating to a 2007 change to do with 100 kilobytes. I.e. 15,000 is not based on what would lead to higher quality, more readable articles, hence the readability discussion above. I've read the links helpfully posted by @Peter Isotalo and not found anything useful on readability. But what about using verifiable evidence on quality, which is very related to readability. Quality is not mentioned once in the article size guidelines? I looked at recently promoted featured articles - October 2023 - and found the largest was about 12,000 words. We could analyse 'recently' promoted featured article maximum length to help improve the guidelines, or put them on a better footing? Grateful for evidence on quality and readability, Tom B (talk) 20:05, 23 November 2023 (UTC)[reply]

The problem is that quality is a different thing than quantity. Badly written text is badly written, no matter whether it's 1500 or 150000 words long. Jo-Jo Eumerus (talk) 07:24, 24 November 2023 (UTC)[reply]
There's a semi-informal 10k limit for FAs, so this is to be expected; there's no causality. DFlhb (talk) 10:35, 24 November 2023 (UTC)[reply]
hi @DFlhb, thank you, there is a formal length requirement yes, but no exact number as you intimate. I was surprised to find a recently promoted article at 12k, that might effectively be the informal limit? For me and others there is causality, the informal limit aids quality, Tom B (talk) 16:05, 24 November 2023 (UTC)[reply]
  • Quality nor readability are not sensible reasons to sub-divide articles because quality has nothing to do with size while readability has to do with the chunking and navigational structure of topics at multiple levels – sentence, paragraph, section, page, topic, category and so forth.
The real issue is the technical size of the page and this seems to be most affected by the amount of templates rather than the amount of prose. For example, the popular page
Deaths in 2023
has an edit note that "References should be in <ref>[url & title]</ref> format, as full citations make the page too slow to load, and too big to edit."
Andrew🐉(talk) 12:37, 24 November 2023 (UTC)[reply]
@Andrew Davidson, thank you, we have a simple disagreement: you say quality has nothing to do with size, I say it does. For me the Napoleon article increases in quality from to 1,000 words, to 8,000 and starts decreasing before about 12,000 words. I got it promoted to GA at 8,000 words and it got demoted at 18,000. Don't most think the quality decreases at some point? We just disagree when? I appreciate it will be different amounts for different articles. I don't think technical size is the big issue any more. The consensus appears to be that readability is now key? Some think we should remove the limit, some like me think we should reduce it e.g. to 12,000, but I'm open to evidence, others might think the 15,000 guideline is fine. Everyone thinks their position will improve quality or not effect it? Tom B (talk) 16:26, 24 November 2023 (UTC)[reply]
Tpbradbury said,

Don't most think the quality decreases at some point?

That reminds me of Salieri's, "Too many notes" in Amadeus. In some cases it may, but not all, and it's not purely a function of length, imho, but of other factors. I believe that there are various human factors involved, one of which is the icons related to quality article awards. I notice many user pages with a string of GA or FA icons, and while I respect the work involved and applaud the improvement to the encyclopedia by these volunteer editors, once an article achieves the award, what happens then? Do these editors continue watching, improving, pruning, maintaining quality as the article grows from 8k to 18k after the award, or do they move on to something else? I'm not ashamed to say that I'd probably move on; I think that's human nature in large part (although I'm aware that some articles have long-term, non-OWNy watchers that remain active and I think that's commendable).
I think another human factor that affects quality that is size-related in a way difficult to quantify is the basic structure of the article as manifested by the choice of section headers, how many of them there are, how deeply nested, and how much content in them. Section headers are the musculo-skeletal system of an article, and the larger it grows, the more difficult it becomes to move sections around, or to disassemble them and reorganize along different lines. Partly this is simply mechanical: moving a top level section with 22kb of content to a different point in the article requires finding the begin and end points, cutting it, finding the destination point, and pasting it. If you've done this, you know it's tedious to prepare and a bit white-knuckly to execute even for the tech-savvy, and probably scares away many editors as not worth the effort. Better tools designed for manipulating article sections could mitigate that problem.
Far more difficult imho, and less amenable to new tools, is analyzing the section structure of an article, realizing that it could be improved by a different organization, designing a new structure, and moving the article towards that goal. That's not so hard for a small stub, but as it grows beyond a stub and gains section headers, my impression is that there are fewer and fewer editors willing to take a 40,000-foot view and reorganize the basic structure. When the original content is extremely poor in quality,
WP:TNT
is an option, and I've done this two or three times with buy-in at Talk, but if quality is not at the extreme end of bad and merely 'poor', there may be opposition to it which may make it impossible to carry out. The result is that articles tend to suffer from a kind of atherosclerosis as they grow, making it harder and harder to do a complete overhaul even if you could find a wiki-surgeon willing to tackle it, and it happens much earlier imho than the size limits mentioned in the table as split territory.
I think this paradigm, if accurate, puts a lot of pressure on editors to get the basic organizational structure of the article right fairly early on while it's still relatively easy to adjust in order to avoid sclerosis later, but that doesn't always happen. Maybe a new type of reviewing team could help, sort of like
Afc but with the goal of having a second look at articles around the time they transition from stubs to start class with a view to establishing a solid section structure amenable to future growth before the article grows too big and locks in something less than optimal. Mathglot (talk) 21:50, 26 December 2023 (UTC)[reply
]
In answer to your first question: most of us remain watching the articles as
shepherds. Libel, nonsense and vandalism gets reverted but additions are not removed unless they are unreferenced. (I had a particular problem with Frank Borman when IPs started posting that he had died. I had to revert them until the news was reported in a RS.) FAs are comprehensive by nature and rarely grow although some topics like Batman
by their nature require ongoing updates. Occasionally you get called back to an old FA when there is an FAR. GAs though can be substantially updated or rewritten.
As a rule, articles will increase in size as new material is added since old material is only removed when a subarticle is created. A recent case of what you are talking about is John von Neumann. He was a polymath, which is to say a complicated subject from our point of view. Normally biographical articles are organised chronologically. The article had grown organically as a result of a series of editors (including myself) who had very different interests and areas of expertise. The readers were probably just as diverse. The obvious path to the article's growth was to create a series subarticles on the different areas of von Neumann's interest, wherein readers could find the detailed information that they were looking for. The problem was that this would involve major restructuring of the main article and considerable work setting up the subarticles and then summarising them. The issue that was then debated at length was whether this was worth the effort when the only issue was the size of the article (15,000 words). Hawkeye7 (discuss) 00:20, 27 December 2023 (UTC)[reply]
That article is an example in similar space to what Mathglot discusses. The main issues in the John von Neumann article case were not size per se, they were issues with factual accuracy and writing quality. Even after substantial reworking, the most recent major edit to that article was to delete a subsection as being apparently fundamentally misguided. The length there served as a warning that brought the other issues to light, but presumably also made maintenance difficult in the preceding period as it was an increasingly large amount to monitor. CMD (talk) 02:29, 27 December 2023 (UTC)[reply]

Removing kb limits

My understanding is there is strong consensus to remove the kb limits thus:

Readable prose size What to do
> 15,000 words Almost certainly should be divided or trimmed.
> 9,000 words Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material.
> 8,000 words May need to be divided or trimmed; likelihood goes up with size.
< 6,000 words Length alone does not justify division or trimming.
< 150 words If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub.

But @Onetwothreeip reverted with "I am sympathetic to some kind of change like this, but really need a strong consensus". I thought we only needed consensus, but I'm not used to guideline changes. Is there consensus to remove the kb limits or good reasons to retain them? Tom B (talk) 15:48, 26 November 2023 (UTC)[reply]

I'd be in favour, but I think we need to make sure that
WP:SIZESPLIT has the same units, so as not to confuse people. When we added the word counts earlier, some people suggested a transition period of a few years to ease people referring to the old kb units. —Femke 🐦 (talk) 16:02, 26 November 2023 (UTC)[reply
]
A few years? Tom B (talk) 16:14, 26 November 2023 (UTC)[reply]
Given that people rarely look at guidelines they're familiar with, I think a year may be quite reasonable. I would prefer a change now, as the word limit suggestions are easy enough to interpret. —Femke 🐦 (talk) 16:18, 26 November 2023 (UTC)[reply]
When asked i have told my students for years about page length over kb (as the academic world does) . "Articles should range between 8,000 and 10,000 words, or approximately thirty-five pages in length, and would include a 150 word lead." Moxy- 16:20, 26 November 2023 (UTC)[1][2][3][4][5][6][7][8][reply]
I see no reason to remove the byte guideline. Each article's word count is not going to be immediately apparent. It is right that the word count is the more prominent guideline, but the byte count is simply an equivalent to that. However, I would support a change that makes it clearer that this table applies to articles that are predominantly prose content. Onetwothreeip (talk) 21:05, 26 November 2023 (UTC)[reply]
Au contraire; few (even experienced) editors understand how to calculate readable prose, and quite often misquote the KB when referring to size. I support removal of the confusing and dated KB bits. SandyGeorgia (Talk) 21:09, 26 November 2023 (UTC)[reply]
I would think that calculating readable prose is simply a matter of counting how many words there are in the prose of the article. I accomplish this myself by copying and pasting the text into something that counts words. Saying that a certain article kilobyte size of a prose article is generally equivalent to a certain amount of words should be complementary to the word count guideline. If it's being misquoted or confused, we should rewrite the guidelines or add clarification instead. Onetwothreeip (talk) 21:39, 26 November 2023 (UTC)[reply]
Rewriting doesn't help considering the most frequent misapplication of this guideline is by people who don't read it. They look at the page, see the KB, and apply that without any knowledge of how readable prose is calculated. I see this routinely (and we're still seeing it, even on this page). Removing KB is the way to go here, as it is not a good approximation of readable prose. SandyGeorgia (Talk) 12:55, 27 November 2023 (UTC)[reply]
My impression is that few or no people have defended using kb size as a metric in the discussion so far. Granted it's not a huge number of participants. And it might be worth thinking about having a guideline on kb size as well, given technical issues ... but on a different page. Jo-Jo Eumerus (talk) 10:02, 28 November 2023 (UTC)[reply]

I support Tom B's edit. We really need to accept that Wikipedia is NOTPAPER. Traditional size limits are a thing of the past, with the exception of mobile users.

In fact, I support removal of any form of maximum total word or byte length for articles and instead support "ease of access limits" on section size. It should be possible to easily hop to a section and open it using a mobile device. AFAIK, unduly large sections might be problematic. (Maybe not, so let's discuss that.) Section size is more of a concern than total article size. Finding info by searching is not a problem, including trillions of bytes. OTOH, "opening" an unduly large section might be a problem for some users. -- Valjean (talk) (PING me) 17:19, 26 November 2023 (UTC)[reply]

Remove KB, keep readable prose word limits; we've covered this at length elsewhere on this page. SandyGeorgia (Talk) 13:06, 27 November 2023 (UTC)[reply]

Counting the number of words has become much easier than it used to be. The Wikipedia:Prosesize script is available in Special:Preferences#mw-prefsection-gadgets. It adds a "Page size" item to end of the Tools section (sidebar or dropdown menu, depending on your skin). WhatamIdoing (talk) 18:44, 27 November 2023 (UTC)[reply]
While these tools are great, they are unfortunately not accessible to the majority of Wikipedia editors, who are not going to know how to use them. That is why guidelines like these are important. Onetwothreeip (talk) 20:17, 27 November 2023 (UTC)[reply]
@SandyGeorgia: Do you have any examples of the guideline being misapplied in such a way? Onetwothreeip (talk) 20:15, 27 November 2023 (UTC)[reply]
So that I don't have to troll back through my contribs for an example, look no further than a spinoff from the discussion here, which uses overall size in KB, rather than readable prose in words, for articles that don't necessarily need to be split. SandyGeorgia (Talk) 22:55, 27 November 2023 (UTC)[reply]
User:Onetwothreeip: you don't own this guideline. You are the only one to disagree with simplifying in this direction, please stop reverting. This is not a major change in the guideline, as the conversions is mostly constant between articles. Getting readable prose in kb or words requires mostly the same tools, so that's not an argument against simplifying either. —Femke 🐦 (talk) 11:10, 29 November 2023 (UTC)[reply]
@Femke and Tpbradbury: It's a major change to the guideline. I said there was "not much" recent discussion, meaning not enough. There have only been a few participants on this when this is a part of the guideline which gets quoted very often. I support change in this area, but the kB equivalents are useful for editors who cannot or don't want to use tools to find the exact word count. Onetwothreeip (talk) 19:36, 29 November 2023 (UTC)[reply]
They would still need the exact same tools to find the kilobyte equivalent is a readable prose, right? We've had a transition period where we showed both. We have no one objecting to this removal of confusing kilobytes, not even you. That's not make Wikipedia more bureaucracy than it already is. —Femke 🐦 (talk) 19:46, 29 November 2023 (UTC)[reply]
Your revert summary said there has not recent discussion, but that is untrue. So why did you revert? Tom B (talk) 11:47, 29 November 2023 (UTC)[reply]
Following from above, we can avoid going to a full community discussion by refactoring the table to include the kilobyte measures as supplementary detail equivalent to the word count measures, while still emphasising the primacy of the word count measures. Onetwothreeip (talk) 19:43, 29 November 2023 (UTC)[reply]
  • WP:CAREFUL, please. There's nothing like consensus for a change in the discussion here. Suggest a RFC. VQuakr (talk) 20:56, 29 November 2023 (UTC)[reply
    ]
    Do we really need an RfC for removal a unit that continues to confuse? I think one or two more people engaged in this discussion should be able to establish a consensus strong enough for everybody to be happy. @VQuakr: would you be willing to weigh in yourself? —Femke 🐦 (talk) 21:05, 29 November 2023 (UTC)[reply]
    Yes, a RFC is absolutely warranted to establish consensus for a change to a guideline like this. Personally I think the kB measurement is fine, though supplementing with words to match the format that the existing page size tool produces might be helpful, but the discussion on this has been sprawling and fragmented. A RFC would help not only to establish consensus but to distill the reasoning in a more terse, readable format that might help me refine my personal position. Or to put another way, we need a WP:SIZE guideline on sizing guideline discussions. :) VQuakr (talk) 21:09, 29 November 2023 (UTC)[reply]
    There's no need for an RfC over something so minor.
    I agree with SandyGeorgia that kB limits mislead some users; I've seen it repeatedly. The total page kB is more prominent and easier to find than the prosesize kB. DFlhb (talk) 21:46, 29 November 2023 (UTC)[reply]
    Mmmkay but there isn't consensus for a change, so correct we don't need an RFC as long as you're good with the status quo. VQuakr (talk) 22:18, 29 November 2023 (UTC)[reply]
    I count 5 editors for, 3 editors who didn't express a clear view but seem to lean 'for', and only you and 123IP opposing. DFlhb (talk) 12:55, 30 November 2023 (UTC)[reply]
    User:VQuakr: Personally I think the kB measurement is fine what do you see as the added value of the kB unit? In terms of consensus, I read 5 people explicitly supporting, one or two who are not explicit but seem to defend arguments in favour of simplification (?), onetwothreeip saying they are sympathetic to the change, and objecting mostly on procedural reasons, and you. —Femke 🐦 (talk) 13:01, 30 November 2023 (UTC)[reply]
    I count 5 editors for... right, this isn't a sufficient level of involvement to change a guideline per
    WP:CONLEVELS. VQuakr (talk) 17:17, 30 November 2023 (UTC)[reply
    ]
    VQuakr, you're the 1 editor who wants to revert back and include kb limits? it's not worth an RFC because one editor wants to include info everyone else wants to remove? Tom B (talk) 18:17, 30 November 2023 (UTC)[reply]
    No, I am not the "1 editor" who disagrees with this proposal. VQuakr (talk) 18:19, 30 November 2023 (UTC)[reply]
    Who are the other editors who want retain the kb limits pls? Tom B (talk) 18:22, 30 November 2023 (UTC)[reply]
    You started this section, no? Surely you are aware of the participants in the ensuing discussion, which makes me think maybe I'm not understanding your question. But it seems moot regardless, as there hasn't been enough involvement yet to establish any consensus for a change. To recap, you started this section with My understanding is there is strong consensus to remove the kb limits..., even though such a consensus did not and does not exist. Editing policies and guidelines is both
    hard and methodical, which is why I've suggested a RFC. I'm frankly not understanding why there is resistance to that, as it is a quite routine step in making this sort of change to a guideline. If you, Femke, and others are correct that this is a slam-dunk proposal then it will garner wide support, so a RFC will just help your case by getting a sufficient level of involvement in the approval of the proposal. VQuakr (talk) 18:35, 30 November 2023 (UTC)[reply
    ]
    Let's keep focussed on content. VQuark, what do you see as the added value of having the kB in there?
    We can start an RfC, but I see editor hours as something precious that I do not want to call on unnecessary. This removal of a confusing unit does not change the guideline in any practical sense, so I do not see the need for an extraordinarily level of consensus. —Femke 🐦 (talk) 19:05, 30 November 2023 (UTC)[reply]
    Neither words nor kB are confusing units of measurement. While I agree this doesn't substantially change the spirit of the guideline, it literally does change it practically speaking; we're changing the units of measurement and there seems to be resistance to the idea of retaining both per the discussion above. Agreed there is no need for an extraordinary level of consensus, but there is a need for consensus. We've got a guideline that has referenced kB of readable prose for well over a decade; more than a half-dozen editors' involvement is warranted before changing. VQuakr (talk) 19:10, 30 November 2023 (UTC)[reply]
    In case you missed it, there is a recent example from a smart and experienced user [1] confusing the easily accessed markup size with readable prose size in kb. If experienced users get confused, how do newer users navigate this? The resistence about retaining both is therefore warranted.
    Let's agree to disagree on the level of consensus. Not sure if I want to open an RfC.. —Femke 🐦 (talk) 19:27, 30 November 2023 (UTC)[reply]
    You're conflating readable prose size vs units of measurement. We care about readable prose, not raw page size, whether we're measuring in kB or words. No, I do not find a single example of someone being temporarily confused remotely convincing that there is a problem or that the proposed change is a solution. I'm also not clear on how your final sentence The resistance about retaining both is therefore warranted logically follows from anything you've said prior. VQuakr (talk) 19:38, 30 November 2023 (UTC)[reply]
    Ah, I now understand. I thought you meant "there is resistance against retaining both", which I agreed with but found a confusing statement given your preceding comments. About is ambiguous here; I should have done a bit more thinking before replying. Funny, to have a misunderstanding of this kind when talking about a guideline with a similar ambiguity for the poor reader. —Femke 🐦 (talk) 20:03, 30 November 2023 (UTC)[reply]

I realize this discussion is a couple of months old at this point, but I wanted to push back on the opening statement:

My understanding is there is strong consensus to remove the kb limits...

I strenuously object to any change which results in the appearance of prose size (or equivalent expressions such as readable prose) as the sole column header or sole yardstick of article size. Doing so leads to complete absurdities, such as assessing our #1 longest article in main space with over 35,000 words, as having only "45 prose words" if prose size is the measure. Please see WT:Splitting#The term 'prose size' for details. I've restored word count in the column header pending discussion of this. There is nothing wrong with taking multiple measures into consideration, and that is what should be done here; there is absolutely nothing wrong with considering raw kb count as one of those measures. Mathglot (talk) 02:03, 25 February 2024 (UTC)[reply]

Well, the problem is that kb counts are misleading people into thinking the HTML/code size is important. "kb" doesn't necessarily mean HTML, but a lot of people are interpreting it as such. Jo-Jo Eumerus (talk) 08:31, 25 February 2024 (UTC)[reply]
I'm not sure I fully understand your reasoning, but I support the change, as word count is plain English, and readable prose is not. —Femke 🐦 (talk) 08:40, 25 February 2024 (UTC)[reply]
Agreed; but playing devil's advocate for a moment against my own position, the opposing argument might go: "Perhaps so, but prose words is well-defined and we can generate a replicable, exact count with a tool, whereas word count is poorly defined: what does it even include?" There is some validity to that, but I think the weakness of prose words and the absurd example resulting from its weakness outweighs that argument. In the end, I don't think we have any one statistic that by itself is sufficient to make the call about splitting, and we should recognize that there are multiple factors involved. Imho, ultimately the table should have additional columns, to provide more information for making a good decision. Mathglot (talk) 09:36, 25 February 2024 (UTC)[reply]
IP non sequitur.
You’re moving to fast forward 24.35.154.137 (talk) 23:06, 27 March 2024 (UTC)[reply]

Collapsed per

WP:TALK: not improving the article. Your Teahouse comment was equally pointless. Mathglot (talk) 00:09, 28 March 2024 (UTC)[reply
]

yes a simplifying change. the risk is that people count the references etc, but hopefully the readable prose definition underneath will mitigate, Tom B (talk) 17:58, 26 February 2024 (UTC)[reply]

Aside on markup size

@VQuakr and Tpbradbury: (and others) I think a reasonable solution would be for the table to indicate that an article of 15,000 is on average a certain amount of kilobytes in markup size, and so on for each. This would provide an easy equivalent for editors to use, maintain consistency with guidelines, and promote the primacy of considering length in terms of word count. Onetwothreeip (talk) 09:16, 30 November 2023 (UTC)[reply]

That's not feasible. There is no one-to-one conversion between markup size in kb and readable prose length (in either kb or words). Some articles are strongly cited, with detailed citation information. Others are only partially cited, or have a very short citation style. A strongly cited article can easily be three times the size as a weakly cited article for the same word lenght. —Femke 🐦 (talk) 12:52, 30 November 2023 (UTC)[reply]
I agree that markup size isn't relevant to a discussion about article length, which whether in words or kB is a discussion about readable prose not raw size. VQuakr (talk) 17:17, 30 November 2023 (UTC)[reply]
Femke, that is why I am saying that the conversion should be a rough average, essentially an estimate. We could even include a range, but we don't need to include for outliers. Markup size is an indication of readable size whether we like it or not, and often the most accessible indication. Onetwothreeip (talk) 19:19, 30 November 2023 (UTC)[reply]
There is going to be a correlation on average, but it varies such a large amount that it's doubtful the average will be at all helpful for application to an individual article. (I would also suggest that the two are confused enough already, and that further confusion does not aid discussion of prose or actual issues with markup size such as
WP:PEIS.) CMD (talk) 05:16, 1 December 2023 (UTC)[reply
]
I agree. An average may be interesting, but it has no proscriptive relevance to any weird "paper mind" (those who don't fully understand and apply NOTPAPER) ideas of an "ideal" article's size. Articles will naturally fall along the X-axis of a bell curve with extremes on each side, and no attempts should be made to shorten long articles in an attempt to make them more "average" in length. Both very short and very long articles have their place. -- Valjean (talk) (PING me) 06:24, 1 December 2023 (UTC)[reply]
This would have no effect on your key point, however for the sake of accuracy, article size by no means follow a
bell curve. I'd wager there's a Y-maximum at around X=9kb followed by a slow decline with a long tail, ending at 844kb. Mathglot (talk) 20:54, 30 December 2023 (UTC)[reply
]
Some articles are also very markup-heavy for other reasons, such as lots of {{lang}}.  — SMcCandlish ¢ 😼  03:50, 1 December 2023 (UTC)[reply]
Extending
ASCII character set many characters take two or three bytes to be expressed in UTF-8. This is more likely to be a factor when non-ASCII characters are used in the article, including articles involving foreign scripts, mathematics, symbols, and other non-ASCII characters, which may requires two, or three bytes to express in UTF-8. Mathglot (talk) 11:19, 29 December 2023 (UTC)[reply
]

Refs (kb limits)

  1. ^ "European Journal of Futures Research". SpringerOpen. May 20, 2013. Retrieved November 26, 2023.
  2. ^ "instructions". academic.oup.com. Retrieved November 26, 2023.
  3. ^ "Manuscript Submission Guidelines: AERA Open: Sage Journals". Sage Journals. January 1, 2023. Retrieved November 26, 2023.
  4. ^ "Early Modern Women: An Interdisciplinary Journal: Instructions for authors". Early Modern Women: An Interdisciplinary Journal. November 17, 2019. Retrieved November 26, 2023.
  5. ISSN 0012-155X
    .
  6. ^ "Submissions". Global Labour Journal. February 3, 2022. Retrieved November 26, 2023.
  7. ^ "BGSU SSCI Journal Publishing Guide" (PDF). Retrieved November 26, 2023.
  8. ^ "Guide for authors". ScienceDirect.com by Elsevier. January 6, 2016. Retrieved November 26, 2023.