Commit 992ba476 authored by dearblue's avatar dearblue

Fix broken UTF-8 characters by `IO#getc`

Character (multi-byte UTF-8) is destroyed when character spanning
`IO::BUF_SIZE` (4096 bytes) exist.

- Prepare file:

  ```ruby
  File.open("sample", "wb") { |f| f << "●" * 1370 }
  ```

- Before patched:

  ```ruby
  File.open("sample") { |f| a = []; while ch = f.getc; a << ch; end; p a }
  # => ["●", "●", ..., "●", "\xe2", "\x97", "\x8f", "●", "●", "●", "●"]

- After patched:

  ```ruby
  File.open("sample") { |f| a = []; while ch = f.getc; a << ch; end; p a }
  # => ["●", "●", ..., "●", "●", "●", "●", "●", "●"]
parent 7cc8c7d2
......@@ -170,8 +170,14 @@ class IO
end
def _read_buf
return @buf if @buf && @buf.bytesize > 0
@buf = sysread(BUF_SIZE)
return @buf if @buf && @buf.bytesize >= 4 # maximum UTF-8 character is 4 bytes
@buf ||= ""
begin
@buf += sysread(BUF_SIZE)
rescue EOFError => e
raise e if @buf.empty?
end
@buf
end
def ungetc(substr)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment