V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
yuann72
V2EX  ›  问与答

请问:比如我把一个 docx 文档的后缀名删掉,只剩下文件名,有什么工具或方法能算出这个文件是什么格式的吗

  •  
  •   yuann72 · Apr 22, 2016 · 2826 views
    This topic created in 3669 days ago, the information mentioned may be changed or developed.

    请问:比如我把一个 docx 文档的后缀名删掉,只剩下文件名,有什么工具或方法能算出这个文件原来是什么格式的吗

    10 replies    2016-04-22 15:05:18 +08:00
    msg7086
        1
    msg7086  
       Apr 22, 2016
    Linux 有 file 工具。 http://linux.die.net/man/1/file
    gdtv
        2
    gdtv  
       Apr 22, 2016 via Android   ❤️ 1
    好像文件头几个字节记录了文件类型,我猜的
    msg7086
        3
    msg7086  
       Apr 22, 2016   ❤️ 1
    @gdtv 是的,魔术头。不过也有一些是根据实际语义去探测的。
    比如 docx 是一个 zip 文件,但是判断成 zip 意义不大。所以工具可能会继续探究其中包含的 xml 文件的结构,来判断具体的文件类型。
    imn1
        4
    imn1  
       Apr 22, 2016
    50 4B 03 04 14 00 06 00

    DOCX, PPTX, XLSX

    Microsoft Office Open XML Format (OOXML) Document
    NOTE: There is no subheader for MS OOXML files as there is with DOC, PPT, and XLS files. To better understand the format of these files, rename any OOXML file to have a .ZIP extension and then unZIP the file; look at the resultant file named [Content_Types].xml to see the content types. In particular, look for the <Override PartName= tag, where you will find word, ppt, or xl, respectively.

    Trailer: Look for 50 4B 05 06 (PK..) followed by 18 additional bytes at the end of the file.
    slixurd
        5
    slixurd  
       Apr 22, 2016
    但是并不一定能正确解析,例如 osx 下的 file :
    两个文件:
    **.doc: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 936, Title: ??????????ѧ?ڶ???ڶ??????????ϻ?, Author: ????ľ??, Template: Normal.dot, Last Saved By: ΢???û?, Revision Number: 15, Name of Creating Application: Microsoft Office Word, Total Editing Time: 01:43:00, Create Time/Date: Sat Aug 27 16:06:00 2011, Last Saved Time/Date: Thu Sep 1 02:24:00 2011, Number of Pages: 29, Number of Words: 1789, Number of Characters: 10203, Security: 0

    ➜ file **.docx
    **.docx: Zip archive data, at least v2.0 to extract
    neutrino
        6
    neutrino  
       Apr 22, 2016
    shoaly
        7
    shoaly  
       Apr 22, 2016
    搜一下 filetypeid 这个软件 windows 版本 拖进去就可以算出 来是什么格式
    clino
        8
    clino  
       Apr 22, 2016
    @slixurd 我在 linux 下可以啊
    $ file test.docx
    test.docx: Microsoft Word 2007+
    $ file test.xlsx
    test.xlsx: Microsoft Excel 2007+
    shiny
        9
    shiny  
    PRO
       Apr 22, 2016
    iHex 打开看头部的一些标记就能推测出来
    Frown
        10
    Frown  
       Apr 22, 2016
    TrIDNet 或者把文件拖到文本编辑器里
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   5888 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 54ms · UTC 06:14 · PVG 14:14 · LAX 23:14 · JFK 02:14
    ♥ Do have faith in what you're doing.