1. 18 Aug, 2021 1 commit
  2. 16 Mar, 2019 1 commit
  3. 31 Jan, 2018 1 commit
  4. 29 Jan, 2018 1 commit
  5. 27 Jan, 2018 1 commit
  6. 25 Jan, 2018 1 commit
  7. 24 Jan, 2018 1 commit
  8. 19 Dec, 2017 4 commits
    • Rich Felker's avatar
      fix iconv output of surrogate pairs in ucs2 · 628cf979
      Rich Felker authored
      in the unified code for handling utf-16 and ucs2 output, the check for
      ucs2 wrongly looked at the source charset rather than the destination
      charset.
      628cf979
    • Rich Felker's avatar
      add support for BOM-determined-endian UCS2, UTF-16, and UTF-32 to iconv · 95c6044e
      Rich Felker authored
      previously, the charset names without endianness specified were always
      interpreted as big endian. unicode specifies that UTF-16 and UTF-32
      have BOM-determined endianness if BOM is present, and are otherwise
      big endian. since commit 5b546faa
      added support for stateful encodings, it is now possible to implement
      BOM support via the conversion descriptor state.
      
      for conversions to these charsets, the output is always big endian and
      does not have a BOM.
      95c6044e
    • Rich Felker's avatar
      add cp866 (dos cyrillic) to iconv · 9d4d0ee4
      Rich Felker authored
      9d4d0ee4
    • Rich Felker's avatar
      update case mappings to unicode 10.0 · 54941edd
      Rich Felker authored
      the mapping tables and code are not automatically generated; they were
      produced by comparing the output of towupper/towlower against the
      mappings in the UCD, ignoring characters that were previously excluded
      from case mappings or from alphabetic status (micro sign and circled
      letters), and adding table entries or code for everything else
      missing.
      
      based very loosely on a patch by Reini Urban.
      54941edd
  9. 18 Dec, 2017 2 commits
  10. 15 Dec, 2017 6 commits
  11. 14 Dec, 2017 1 commit
    • Rich Felker's avatar
      fix data race in at_quick_exit · 64303156
      Rich Felker authored
      aside from theoretical arbitrary results due to UB, this could
      practically cause unbounded overflow of static array if hit, but
      hitting it depends on having more than 32 calls to at_quick_exit and
      having them sufficiently often.
      64303156
  12. 12 Dec, 2017 1 commit
  13. 11 Dec, 2017 1 commit
    • Timo Teräs's avatar
      implement strftime padding specifier extensions · 8a6bd730
      Timo Teräs authored
      notes added by maintainer:
      
      the '-' specifier allows default padding to be suppressed, and '_'
      allows padding with spaces instead of the default (zeros).
      
      these extensions seem to be included in several other implementations
      including FreeBSD and derivatives, and Solaris. while portable
      software should not depend on them, time format strings are often
      exposed to the user for configurable time display. reportedly some
      python programs also use and depend on them.
      8a6bd730
  14. 06 Dec, 2017 2 commits
    • Rich Felker's avatar
      adjust fopencookie structure tag for ABI-compat · 2488d31f
      Rich Felker authored
      stdio types use the struct tag names from glibc libio to match C++
      ABI.
      2488d31f
    • William Pitcock's avatar
      implement the fopencookie extension to stdio · 06184334
      William Pitcock authored
      notes added by maintainer:
      
      this function is a GNU extension. it was chosen over the similar BSD
      function funopen because the latter depends on fpos_t being an
      arithmetic type as part of its public API, conflicting with our
      definition of fpos_t and with the intent that it be an opaque type. it
      was accepted for inclusion because, despite not being widely used, it
      is usually very difficult to extricate software using it from the
      dependency on it.
      
      calling pattern for the read and write callbacks is not likely to
      match glibc or other implementations, but should work with any
      reasonable callbacks. in particular the read function is never called
      without at least one byte being needed to satisfy its caller, so that
      spurious blocking is not introduced.
      
      contracts for what callbacks called from inside libc/stdio can do are
      always complicated, and at some point still need to be specified
      explicitly. at the very least, the callbacks must return or block
      indefinitely (they cannot perform nonlocal exits) and they should not
      make calls to stdio using their own FILE as an argument.
      06184334
  15. 20 Nov, 2017 2 commits
    • Rich Felker's avatar
      make fgetwc handling of encoding errors consistent with/without buffer · 4000b010
      Rich Felker authored
      previously, fgetwc left all but the first byte of an illegal sequence
      unread (available for subsequent calls) when reading out of the FILE
      buffer, but dropped all bytes contibuting to the error when falling
      back to reading a byte at a time. neither behavior was ideal. in the
      buffered case, each malformed character produced one error per byte,
      rather than one per character. in the unbuffered case, consuming the
      last byte that caused the transition from "incomplete" to "invalid"
      state potentially dropped (and produced additional spurious encoding
      errors for) the next valid character.
      
      to handle both cases uniformly without duplicate code, revise the
      buffered case to only cover situations where a complete and valid
      character is present in the buffer, and fall back to byte-at-a-time
      for all other cases. this allows using mbtowc (stateless) instead of
      mbrtowc, which may slightly improve performance too.
      
      when an encoding error has been hit in the byte-at-a-time case, leave
      the final byte that produced the error unread (via ungetc) except in
      the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
      continuation bytes with no lead byte). single-byte errors are fully
      consumed so as not to leave the caller in an infinite loop repeating
      the same error.
      
      none of these changes are distinguished from a conformance standpoint,
      since the file position is unspecified after encoding errors. they are
      intended merely as QoI/consistency improvements.
      4000b010
    • Rich Felker's avatar
      fix treatment by fgetws of encoding errors as eof · a90d9da1
      Rich Felker authored
      fgetwc does not set the stream's error indicator on encoding errors,
      making ferror insufficient to distinguish between error and eof
      conditions. feof is also insufficient, since it will return true if
      the file ended with a partial character encoding error.
      
      whether fgetwc should be setting the error indicator itself is a
      question with conflicting answers. the POSIX text for the function
      states it as a requirement, but the ISO C text seems to require that
      it not. this may be revisited in the future based on the outcome of
      Austin Group issue #1170.
      a90d9da1
  16. 19 Nov, 2017 1 commit
  17. 15 Nov, 2017 1 commit
    • Rich Felker's avatar
      add reverse iconv mappings for JIS-based encodings · a223dbd2
      Rich Felker authored
      these encodings are still commonly used in messaging protocols and
      such. the reverse mapping is implemented as a binary search of a list
      of the jis 0208 characters in unicode order; the existing forward
      table is used to perform the comparison in the search.
      a223dbd2
  18. 13 Nov, 2017 2 commits
    • Rich Felker's avatar
      generalize iconv framework for 8-bit codepages · 105eff9d
      Rich Felker authored
      previously, 8-bit codepages could only remap the high 128 bytes; the
      low range was assumed/forced to agree with ascii. interpretation of
      codepage table headers has been changed so that it's possible to
      represent mappings for up to 256 slots (fewer if the initial portion
      of the map is elided because it coincides with unicode codepoints).
      this requires consuming a bit more of the 10-bit space of characters
      that can be represented in 8-bit codepages, but there's still a plenty
      left. the size of the legacy_chars table is actually reduced now by
      eliding the first 256 entries and considering them to map implicitly
      via the identity map.
      
      before these changes, there seem to have been minor bugs/omissions in
      codepage table generation, so it's likely that some actual bug fixes
      are silently included in this commit. round-trip testing of a few
      codepages was performed on the new version of the code, but no
      differential testing against the old version was done.
      105eff9d
    • Rich Felker's avatar
      fix malloc state corruption when ldso rejects loading a second libc · a71b46cf
      Rich Felker authored
      commit c49d3c8a added logic to detect
      attempts to load libc.so via another name and instead redirect to the
      existing libc, rather than loading two and producing dangerously
      inconsistent state. however, the check for and unmapping of the
      duplicate libc happened after reclaim_gaps was already called,
      donating the slack space around the writable segment to malloc.
      subsequent unmapping of the library then invalidated malloc's free
      lists.
      
      fix the issue by moving the call to reclaim_gaps out of map_library
      into load_library, after the duplicate libc check but before the first
      call to calloc, so that the gaps can still be used to satisfy the
      allocation of struct dso. this change also eliminates the need for an
      ugly hack (temporarily setting runtime=1) to avoid reclaim_gaps when
      loading the main program via map_library, which happens when ldso is
      invoked as a command.
      
      only programs/libraries erroneously containing a DT_NEEDED reference
      to libc.so via an absolute pathname or symlink were affected by this
      issue.
      a71b46cf
  19. 11 Nov, 2017 2 commits
    • Rich Felker's avatar
      reformat cjk iconv tables to be diff-friendly, match tool output · d060edf6
      Rich Felker authored
      the new version of the code used to generate these tables forces a
      newline every 256 entries, whereas at the time these files were
      originally generated and committed, it only wrapped them at 80
      columns. the new behavior ensures that localized changes to the
      tables, if they are ever needed, will produce localized diffs. other
      tables including hkscs were already committed in the new format.
      
      binary comparison of the generated object files was performed to
      confirm that no spurious changes slipped in.
      d060edf6
    • Bobby Bingham's avatar
      prevent fork's errno from being clobbered by atfork handlers · c21051e9
      Bobby Bingham authored
      If the syscall fails, errno must be set correctly for the caller.
      There's no guarantee that the handlers registered with pthread_atfork
      won't clobber errno, so we need to ensure it gets set after they are
      called.
      c21051e9
  20. 10 Nov, 2017 8 commits
    • Rich Felker's avatar
      add iso-2022-jp support (decoding only) to iconv · a39f20bf
      Rich Felker authored
      this implementation aims to match the baseline defined by rfc1468 (the
      original mime charset definition) plus the halfwidth katakana
      extension included in the whatwg definition of the charset. rejection
      of si/so controls and newlines in doublebyte state are not currently
      enforced. the jis x 0201 mode is currently interpreted as having the
      yen sign and overline character in place of backslash and tilde; ascii
      mode has the standard ascii characters in those slots.
      a39f20bf
    • Rich Felker's avatar
      add iconv framework for decoding stateful encodings · 5b546faa
      Rich Felker authored
      assuming pointers obtained from malloc have some nonzero alignment,
      repurpose the low bit of iconv_t as an indicator that the descriptor
      is a stateless value representing the source and destination character
      encodings.
      5b546faa
    • Rich Felker's avatar
      simplify/optimize iconv utf-8 case · 0df5b39a
      Rich Felker authored
      the special case where mbrtowc returns 0 but consumed 1 byte of input
      does not need to be considered, because the short-circuit for low
      bytes already covered that case.
      0df5b39a
    • Rich Felker's avatar
      handle ascii range individually in each iconv case · 9eb6dd51
      Rich Felker authored
      short-circuiting low bytes before the switch precluded support for
      character encodings that don't coincide with ascii in this range. this
      limitation affected iso-2022 encodings, which use the esc byte to
      introduce a shift sequence, and things like ebcdic.
      9eb6dd51
    • Rich Felker's avatar
      move iconv_close to its own translation unit · bff59d13
      Rich Felker authored
      this is in preparation to support stateful conversion descriptors,
      which are necessarily allocated and thus must be freed in iconv_close.
      putting it in a separate TU will avoid pulling in free if iconv_close
      is not referenced.
      bff59d13
    • Rich Felker's avatar
      refactor iconv conversion descriptor encoding/decoding · 79f49eff
      Rich Felker authored
      this change is made to avoid having assumptions about the encoding
      spread out across the file, and to facilitate future change to a form
      that can accommodate allocted, stateful descriptors when needed.
      
      this commit should not produce any functional changes; with the
      compiler tested the only change to code generation was minor
      reordering of local variables on stack.
      79f49eff
    • A. Wilcox's avatar
      fix getaddrinfo error code for non-numeric service with AI_NUMERICSERV · 30fdda6c
      A. Wilcox authored
      If AI_NUMERICSERV is specified and a numeric service was not provided,
      POSIX mandates getaddrinfo return EAI_NONAME. EAI_SERVICE is only for
      services that cannot be used on the specified socket type.
      30fdda6c
    • Rich Felker's avatar
      fix mismatched type of __pthread_tsd_run_dtors weak definition · 67b29947
      Rich Felker authored
      commit a6054e3c changed this function
      not to take an argument, but the weak definition used by timer_create
      was not updated to match.
      
      reported by Pascal Cuoq.
      67b29947