Jump to content

Talk:File size

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Comparing files and disks

[edit]

Western Digital had this footnote in their settlement. "Apparently, Plaintiff believes that he could sue an egg company for fraud for labeling a carton of 12 eggs a “dozen,” because some bakers would view a “dozen” as including 13 items."

I think this is an excellent metaphor (or illustration). One extra egg raises the size of a "dozen" by 0.8% which is roughly the amount by which the software industry has increased the size of a gigabyte (7.3%) or a terabyte (10.0%). --Uncle Ed (talk) 19:19, 20 December 2007 (UTC)[reply]

It's so but there are several errors in that. At first, classical KB and MB are computer terms, which do not have to be related to science terms, as mouse is strictly not a mouse. In old computers, memory indexing and such stuff was very clearly based on binary - for example some block sizes of graphics cards and such. To gain even considerable speed it's clear that those pages must be sized by powers of two. Talking about page size of 65.536 is nonsense, so measuring it as 65 is good. As such terms as size of file must be simple to programmers, who are actually the only ones who could speak or think about such size 1000 times per day and who also needs an exact calculations, it's normal to have such size. Users, on the other hand, should know their computer. And, there is one more - 1024 is actually bigger than 1000, not smaller.--Intuite (talk) 00:04, 4 February 2009 (UTC)[reply]

Hard drive size and size for files

[edit]

It should be noticed, also, talking about disk drives and their sizes, that 60GB disk is still possibly less than 60GB. On any disk format, there has to be some additional data - file names and attributes, physical file location data (usually two copies for safety), possibly boot sector data and journal data; also directories are kept as a kind of files (physically), but their sizes might not be added to their "directory size" - only sum of file content sizes is considered. This area of disk is totally unusable for file content. On higher levels of operating system there are also hidden files for thumbnails and other cache data. Anyway, on lowest levels of disk architecture, total sum of file sizes you can put to disk could be considerably smaller than disk size (this is true especially for small files). Also, bad sectors and other errors could make disks smaller.--Intuite (talk) 00:04, 4 February 2009 (UTC)[reply]

Yes, but. Just because the space is not usable to the person who buys the drive doesn't mean the storage space doesn't exist. It's there. You just can't use it. :( They tease us that way. ... IMO, WRT this article, this info is off topic. Stevebroshar (talk) 12:08, 21 December 2024 (UTC)[reply]

Article name

[edit]

I totally doubt if this is the right article to talk about file size units as those units apply also to memory size measurement (which does not contain files) or for measurement of network speed. I suggest "memory size" as it's not wrong to call all disks, memories, caches, registers, usb sticks and such stuff memory devices. So, what is kept on those devices is memory and what is measured, is also memory - and it might be file or file part of data transferred through internet in last hour or c++ object or cache size, which is to be measured.--Intuite (talk) 00:04, 4 February 2009 (UTC)[reply]

I agree that this article title (file size) is bad. As with many WP articles, someone added an article for a term that is used in the world, but does not warrant its own article. IMO, size is an attribute of a file; not a standalone concept. Articles should exist to answer the question: What is X? And the question "What is file size?" is so obvious no one will ask it. File size is of course, the size of a file. ... You delve into how to save the content of this article by giving it a better name. ok. What is this article about? It definitely talks about the size of files so can't just rename it. This article also talks about file system issues like allocation size (which arguably is not about file size), sector (which is out of date since does not apply to SSD) and maximum file size (not about file size IMO). So it already contains off topic info :( ... The section "Units of information" seems to be about your idea: memory size. But Units of information covers that topic already ... even though the name of that article sucks. I've never heard of units of information. I've never wondered: what is units of information? But, I don't consider memory size to be a notable term either. Some notable concepts don't have a notable name/title. ... Here's my suggestion (after just one cup of coffee): Change this page to redirect to Units of information. The info here is low value. Dump it. How's that for a bold change? Stevebroshar (talk) 12:34, 21 December 2024 (UTC)[reply]

KB for 1024 is not wrong

[edit]

KB as 1024 is not "technically imprecise", but it's older system. Some companies have not (yet?) adopted the new system and that does not mean that their calculations are "imprecise". When I started computer programming, calling 1000 bytes KB was technically imprecise - and there is actually no strong reason for all companies to change that viewpoint. I'm 100% sure that those people, who called 1024 bytes KB, did know very well about use of that prefix in other fields - they were not imprecise, but engineers and those are two very different things. One should consider the article about SI and it's history part - telling the history that KB first meant kilobyte (and Kb meant kilobit for many as Mbps differs from MBps), which had strong technical reason - for computer function, which gets it's input in KB or physical partition, which has file data it really would not make sense to send this data in 1000-byte units as it makes things slower. It was primarily time of assembly, when low-level was the only level and users had to be able to understand the workings of their computer. Time went and high-level languages and interfaces appeared, computer users occasionally started to be not computer freaks, but "people from the street", also computers became faster so that for such a big unit as file it was simple to do DIV 1000 and not SHL 10 - and, indeed, doing DIV 1000, which was slower than SHL 10 on old computers, is as fast as SHL 10 on computers on today. So there were several reasons to make a proposal to start using KB as 1000 bytes, but it should be taken as it is - old is right and new is right until everyone is using the new, including Microsoft as they actually aren't some company from periphery and for sure they are mostly computer professionals and not imprecise. This is normal in history in science that the same thing might mean one thing in one and totally other thing in another science (or field), so particle in physics and particle of sand is neither imprecise. So, neutral would this article be if there is some talk about different measuring systems used today by field professionals, which are measurements of exactness and also some link to articles about why one system might be better than another. I myself prefer 1000 in common vocabulary and 1024 in programming right now as I really thing that it would be a kind of mess otherwise. Just changing units in some programming language would make programs crash - and changing all documentation would be a total mess. Actually I dont even understand, why such unit as "byte", it's 8 bits. As a real fun, taking 8-bit byte as granted would also be imprecise as historically, there was 7-bit byte as this is not the first version of ASCII system used right now; going further, ASCII is slowly becoming a history and UTF-8 is used indeed. Having 8000-bit kilobyte is kind of nonsense also, considering the fact that 1 bit is the fundamental unit of size on all computers today and we are using 10-based system. File sizes are, yes, always integer*8 bits, but memory sizes might be 1, 2, 4 or in some compressed files, for example, 12 bits as well. And files are not 8-bit based on all computers. For me, 10-bit base unit would make sense as well.--Intuite (talk) 00:06, 4 February 2009 (UTC)[reply]

...actually I personally think that there could be even such kind of system where base unit is meme and 1 meme (1M) = 10**12 bits. This is 125GB. 1KB would, then, be 8 nanomemes, 1MB would be 8 micromemes and 1GB would be 8 millimemes (mM or mm?). Future has drives sized as large as several terabytes and one bit is really small, so calling it one picobit is not very odd. Current hard drives, thus, would be 3 or more memes. Thus, meme is unit of data really containing something - as human eye has about million*million pixels as some books suggest, it's about one fullscreen black and white image on some future computer, which is good size for data measurement. And it's not imprecise in sense that you can talk about bits and really big data sizes in one coherent system. It also tells much about how we feel about file sizes in near future or now - bit is really pico, there are many billions of these in any given computer; 1/8kb is somewhat nano, you can store millions of those; 1/8mb would be micro as a movie contains about 4000 of those and normal fullscreen image contains several. 1/8GB would be milli soon - and mega would be really large then, as our common sense says ..like kilometer and kilobyte right now are very different in scale. Tera (10**12bits) would be a lot for a while, so there is a space for growth. Having picomeme as 1 bit is grounded as pico is smallest well-known measure and bit is smallest portion of data in computer (actually smallest portion of data even possible logically, which grounds bits also). BTW, current hard drive would be about 4000 millimemes or 400 decimemes :) --Intuite (talk) 00:04, 4 February 2009 (UTC)[reply]

Byte size

[edit]

http://www.answers.com/topic/byte - answers.com gives good definition of byte. (or http://www.yourdictionary.com/byte).

So, real definition of byte is that it is a "basic unit of digital computer". On PC, this basic unit is 8 nowadays, but it used to be 7. Byte is unit, in which files or other data is measured on that specific computer - of course, you can simulate different data types anywhere, but on PC, most hardware data (except flags, which are still usually contained in 8^n bit tuples) is kept in memory area, which is n*8 bits, where n is integer. In cases when data is kept in smaller units, it is called "compressed" - and one piece of compressed units in assembly is usually still contained in 8*n memory area, at least when manipulated with assembler's base commands.

So, saying that 1KB must be 1000 bytes is correct, but measuring hardware sizes in bytes is correct only as long as you are using computer, which uses 8-bit characters. When utf-8 becomes more widespread, it's possible that 32-bit bytes will be used instead and this makes KB 4 times larger. So, KB is system-specific notation. Saying that byte is 8 bits would be as good as saying that word is 32 bits (or worse, that integers are 32 bits) - words have been both 16 and 32 bits depending if archidecture is 16 or 32 bits; on 64-bits archidecture it might happen that many languages start to refer 64-bit integers as "words". So, "word" simply means some datatype, which is native enough to do calculations on computer; byte mostly means smallest size of non-empty file or system-managed memory area.--Intuite (talk) 21:16, 4 February 2009 (UTC)[reply]

Lots to unpack ... The simulation and compression stuff makes no sense to me. It'll skip that. ... utf-8 is about encoding text in 8 bit chunks. Makes no sense to say that 32-bit bytes would be used for utf-8; that would be utf-32; which is a thing. Notice that the naming references bit size; not byte size. So, I don't get your point about unicode. ... The term "word" does have different sizes on different hardware. Word is used for the basic unit of storage on a particular hardware; not "byte". Byte remains as 8 bits. OK. so there were 7 bit bytes in the past. That is now obsolete. It's 8 now. ... and seemingly forever. But who knows what the future brings. We can't be responsible to be accurate tomorrow; only today. ... I recently removed the section from this article that was on byte size what not. That stuff is covered in units of information. Stevebroshar (talk) 07:40, 23 December 2024 (UTC)[reply]

A bold reduction

[edit]

I did remove much of the content. It was a bold change. My justification is that file size is for the most part an obvious concept. It's the size of a file!

I removed the section on units of information since it's covered in detail in its own page. The intro of this article (file size) says that files are typically in bytes and links to that article. That's all this article needs to say to cover units of measure/information.

I removed the maximum size section since that was a tiny except from Comparison of file systems. It added no additional value and was just a duplicate of random parts of that article. Stevebroshar (talk) 13:53, 21 December 2024 (UTC)[reply]